OpenStackのbr-tunインタフェースのフローエントリについて

はじまり

先日 GNS3 + Nested KVM + OpenStack with RDO でOpenStack環境を構築できることを確認した。
その際、VXLANを用いた仮想サブネットを構築したわけだが、これは単純に考えるとループが発生するトポロジとなる。
今回は、br-tunにインストールされたフローエントリから、トンネルを用いたOpenStackの仮想サブネットがどのようにループを防止しているかを見ていく。

初見

まずは、前回のOpenStackの仮想サブネット構成を例に、仮想スイッチの構成を見てみよう。

compute02で見た場合の仮想スイッチの構成（例）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


# ovs-vsctl show
a01814aa-f4ac-4848-af80-bcc45568e4aa
    Bridge br-int
        fail_mode: secure
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "qvodfe516d1-4c"
            tag: 3
            Interface "qvodfe516d1-4c"
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        Port "vxlan-c0a86502"
            Interface "vxlan-c0a86502"
                type: vxlan
                options: {in_key=flow, local_ip="192.168.101.12", out_key=flow, remote_ip="192.168.101.2"}
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-c0a8650b"
            Interface "vxlan-c0a8650b"
                type: vxlan
                options: {in_key=flow, local_ip="192.168.101.12", out_key=flow, remote_ip="192.168.101.11"}
    ovs_version: "1.11.0"

問題意識

さて、普通にOpen vSwitchのVXLANを複数点で接続すると、下図のようなループが発生することが予想される。

が、当然OpenStackではループでネットワークの身動きが取れないと言った状況は発生していない。
この問題は、単純に考えると「トンネルから入ってきたパケットは、トンネルにはフラッディングしない」ことで解決すると考えられる。
そして、OpenStackの作るフローエントリはまさにそういう状況になっているのだろう。

答え合わせ

では、同仮想スイッチのフローエントリを見てみよう。

compute02で見た場合の仮想スイッチのフローエントリ（例）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=115832.009s, table=0, n_packets=457, n_bytes=47352, idle_age=1391, hard_age=65534, priority=1,in_port=3 actions=resubmit(,3)
 cookie=0x0, duration=115902.012s, table=0, n_packets=556, n_bytes=57416, idle_age=1391, hard_age=65534, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0x0, duration=115900.908s, table=0, n_packets=54, n_bytes=5064, idle_age=23814, hard_age=65534, priority=1,in_port=2 actions=resubmit(,3)
 cookie=0x0, duration=115901.967s, table=0, n_packets=4, n_bytes=300, idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=115901.888s, table=1, n_packets=492, n_bytes=51340, idle_age=1391, hard_age=65534, priority=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x0, duration=115901.754s, table=1, n_packets=64, n_bytes=6076, idle_age=23814, hard_age=65534, priority=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)
 cookie=0x0, duration=115901.675s, table=2, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=110244.491s, table=3, n_packets=300, n_bytes=31148, idle_age=1391, hard_age=65534, priority=1,tun_id=0xb actions=mod_vlan_vid:3,resubmit(,10)
 cookie=0x0, duration=115901.619s, table=3, n_packets=28, n_bytes=2520, idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=115901.569s, table=10, n_packets=483, n_bytes=49896, idle_age=1391, hard_age=65534, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0x0, duration=115901.529s, table=20, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=resubmit(,21)
 cookie=0x0, duration=110244.534s, table=21, n_packets=18, n_bytes=1724, idle_age=23814, hard_age=65534, dl_vlan=3 actions=strip_vlan,set_tunnel:0xb,output:3,output:2
 cookie=0x0, duration=115901.489s, table=21, n_packets=14, n_bytes=1036, idle_age=65534, hard_age=65534, priority=0 actions=drop

見やすくするため、不要部分を削って列を揃えて並び替えてテーブル毎で区切るとこうなる。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


table=0,  priority=1, in_port=1                                  actions=resubmit(,1)
table=0,  priority=1, in_port=2                                  actions=resubmit(,3)
table=0,  priority=1, in_port=3                                  actions=resubmit(,3)
table=0,  priority=0                                             actions=drop

table=1,  priority=1, dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
table=1,  priority=1, dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)

table=2,  priority=0                                             actions=drop

table=3,  priority=1, tun_id=0xb                                 actions=mod_vlan_vid:3,resubmit(,10)
table=3,  priority=0                                             actions=drop

table=10, priority=1                                              actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1

table=20, priority=0                                             actions=resubmit(,21)

table=21,             dl_vlan=3                                  actions=strip_vlan,set_tunnel:0xb,output:3,output:2
table=21, priority=0                                             actions=drop

文字を追うのはつらいので、雑な絵に描き起こすとこうなる。

と言うわけで、こんな風に「トンネルから入ってきたパケットはフラッディングせず、Uplinkにのみ投げる(Table=10)」と言う状況を実現している。
つまりこうだ。

br-tunを1つのインタフェースとして見れば、これはSplit-Horizonと同義と言える。
そして、この挙動自体はbr-tunを作らない（即ち、br-intに直接VXLANインタフェースを作る）場合でも実現は出来るが、その場合は仮想マシンの数に合わせてフローエントリを増やさなければならず、しかも既存のフローエントリのoutputを変更する必要も出てくる。
br-tunにトンネルインタフェースを専任させることで、トンネルインタフェースとUplink(br-int向)インタフェースをシンプルに分けることが出来るので、フローエントリもスッキリする。
もちろん、これはフルメッシュ構造が基本で、トンネルトポロジがスター型だったりするとパケットが届かない場所が出てきたりするので、使い分けが必要な話。
こいつは当然GREの時点から考慮されていたもので「VXLANでも使えるけどユニキャストなトンネルインタフェースで共通」なお話である。

まとめ

OpenStackはbr-tunをトンネル専用インタフェースとすることで、フローエントリを抑えてSplit-Horizonを実現している。

当然、ovs-ofctl del-flows br-tunを叩くと、トポロジを維持したままループが発生する状況を作れてしまうので、 SDNコントローラ開発中に誤ってbr-tunのフローエントリを消してしまわないよう留意する必要があるだろう。

と言うお話だったとさ。