Tuesday, July 15, 2014

L3 Troubleshooting in QFabric: routing issues


I had an issue in the Data Center Network related to connectivity last week, which turned out being a routing issue. At first, vlan 46 couldn't reach another specific server in vlan 30  with IP 192.168.30.190. My first thought was to move L3 configuration of vlan 30 to QFabric since that server were connected locally to it.

First, I checked MAC addresses for its particular vlan to prove nothing was there:

root@qfabric# run show arp | match vlan.30


Then, I checked routed interface vlan 46 which includes vrrp staments to remember what were the addresses set up in it:

root@qfabric# show interfaces vlan.46
family inet {
    address 192.168.46.7/24 {
        vrrp-group 46 {
            virtual-address 192.168.46.10;
            priority 310;
            accept-data;
        }
    }
}

Checking vrrp status of interface vlan 46 to make sure it is working fine with the configurations lines above:

root@qfabric# run show vrrp
Interface     State       Group   VR state VR Mode   Timer    Type   Address
vlan.46       up             46   master   Active      A  0.663 lcl    192.168.46.7
                                                                vip    192.168.46.10


Ping with a specific source address of routed interface vlan 46 to local server in vlan 30:

root@qfabric> ping 192.168.30.190 source 192.168.46.7
PING 192.168.30.190 (192.168.30.190): 56 data bytes
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 

As it shows, there is no real reply back from the server IP address. Well, why it can't ping? let's see where is its route is coming from:

root@qfabric# run show route 192.168.30.190

inet.0: 70 destinations, 70 routes (70 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

192.168.30.190/32   *[OSPF/150] 1d 00:41:12, metric 70, tag 0
                      to 192.168.204.193 via vlan.204
                    > to 192.168.204.197 via vlan.205

[edit]

root@qfabric# run show ospf neighbos | match (vlan.204|vlan.205)
Address          Interface              State     ID               Pri  Dead
192.168.204.193   vlan.204               Full      192.168.204.3       1    61
192.168.204.197   vlan.205               Full      192.168.204.4       1    77


[edit]

Its route is coming from a peer device with ID 172.19.204.3. At this point, I collected all information I needed to move L3 configuration of vlan 30 to QFabric. To achieve that,  I issued the following commands that enabled routed vlan interface in vlan 30.

set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 virtual-address 192.19.30.10
set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 priority 310
set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 accept-data
set protocols ospf area 0.0.0.0 interface vlan.30 passive
set vlans v30 l3-interface vlan.30

After commiting new configuration lines above, I checked vrrp status for vlan 30:

root@qfabric# run show vrrp
Interface     State       Group   VR state VR Mode   Timer    Type   Address
vlan.30       up             30   master   Active      A  0.125 lcl    192.168.30.7
                                                                vip    192.168.30.10

It looks good. Then, I could be able to see MAC addresses in this vlan:

root@qfabric# run show arp | match vlan.30
ae:8d:eb:54:a0:c7 192.168.30.44    192.168.30.44              vlan.30             none
a0:26:55:aa:70:c6 192.168.30.46    192.168.30.46              vlan.30             none
a0:1a:4b:f3:66:cc 192.168.30.49    192.168.30.49              vlan.30             none
a4:85:64:ee:23:c6 192.168.30.106   192.168.30.106             vlan.30             none
a8:b5:99:91:ef:ca 192.168.30.111   192.168.30.111             vlan.30             none
a4:85:64:ee:23:ce 192.168.30.115   192.168.30.115             vlan.30             none
a4:85:64:ee:13:c6 192.168.30.116   192.168.30.116             vlan.30             none
ae:4e:7d:0a:d3:c4 192.168.30.120   192.168.30.120             vlan.30             none
a0:26:55:a9:5e:ce 192.168.30.121   192.168.30.121             vlan.30             none
a4:85:64:ee:33:c6 192.168.30.122   192.168.30.122             vlan.30             none
a4:85:64:ee:d4:ce 192.168.30.123   192.168.30.123             vlan.30             none
a4:85:64:ee:13:ce 192.168.30.128   192.168.30.128             vlan.30             none
a4:85:64:ee:a4:ca 192.168.30.129   192.168.30.129             vlan.30             none

Checking the route again to make sure is not being received through OSPF anymore:

root@qfabric# run show route 192.168.30.190

inet.0: 70 destinations, 70 routes (70 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

192.168.30.0/24     *[Direct/0] 00:36:58
                    > via vlan.30

[edit]

Direct route means that server is connected to QFabric. It makes sense again so let's ping:

root@qfabric# run ping 192.168.30.190 source 192.168.46.7
PING 192.168.30.190 (192.168.30.190): 56 data bytes
64 bytes from 192.168.30.190: icmp_seq=0 ttl=255 time=1846 micro seconds
64 bytes from 192.168.30.190: icmp_seq=1 ttl=255 time=1861 micro seconds
64 bytes from 192.168.30.190: icmp_seq=2 ttl=255 time=1718 micro seconds
64 bytes from 192.168.30.190: icmp_seq=3 ttl=255 time=1969 micro seconds
64 bytes from 192.168.30.190: icmp_seq=4 ttl=255 time=1607 micro seconds
[abort]

[edit]
root@qfabric#

As it shows I was able to reach server 192.168.30.190 while issuing ping command locally from QFabric, however, other servers in vlan 46 still couldn't reach it. After some additional troubleshooting I ended up discovering a static route in OSPF peer device with ID 192.168.204.3 (SwitchL3):

SwitchL3#show ip route 192.168.30.190
Routing entry for 192.168.30.190/32
  Known via "static", distance 1, metric 0
  Redistributing via ospf 10, ospf 100
  Routing Descriptor Blocks:
  * 192.168.204.190
      Route metric is 0, traffic share count is 1

The destination IP address 192.168.204.190 is an external firewall, which showed up in the first ping while troubleshooting in the first part of this article. Once that static route was removed, connectivity went back to normal. Other devices in area 0 couldn't reach it because they were receiving specific route for this particular server IP 192.168.30.190 coming from device "SwitchL3" which had a better preference than Qfabric's advertising route for the complete vlan  192.168.30.0/24. Within QFabric, local or direct routes are always prefered and that's why ping worked fine only here.