Tuesday, July 15, 2014

L3 Troubleshooting in QFabric: routing issues


I had an issue in the Data Center Network related to connectivity last week, which turned out being a routing issue. At first, vlan 46 couldn't reach another specific server in vlan 30  with IP 192.168.30.190. My first thought was to move L3 configuration of vlan 30 to QFabric since that server were connected locally to it.

First, I checked MAC addresses for its particular vlan to prove nothing was there:

root@qfabric# run show arp | match vlan.30


Then, I checked routed interface vlan 46 which includes vrrp staments to remember what were the addresses set up in it:

root@qfabric# show interfaces vlan.46
family inet {
    address 192.168.46.7/24 {
        vrrp-group 46 {
            virtual-address 192.168.46.10;
            priority 310;
            accept-data;
        }
    }
}

Checking vrrp status of interface vlan 46 to make sure it is working fine with the configurations lines above:

root@qfabric# run show vrrp
Interface     State       Group   VR state VR Mode   Timer    Type   Address
vlan.46       up             46   master   Active      A  0.663 lcl    192.168.46.7
                                                                vip    192.168.46.10


Ping with a specific source address of routed interface vlan 46 to local server in vlan 30:

root@qfabric> ping 192.168.30.190 source 192.168.46.7
PING 192.168.30.190 (192.168.30.190): 56 data bytes
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 
92 bytes from 192.168.204.190: ttl=62 

As it shows, there is no real reply back from the server IP address. Well, why it can't ping? let's see where is its route is coming from:

root@qfabric# run show route 192.168.30.190

inet.0: 70 destinations, 70 routes (70 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

192.168.30.190/32   *[OSPF/150] 1d 00:41:12, metric 70, tag 0
                      to 192.168.204.193 via vlan.204
                    > to 192.168.204.197 via vlan.205

[edit]

root@qfabric# run show ospf neighbos | match (vlan.204|vlan.205)
Address          Interface              State     ID               Pri  Dead
192.168.204.193   vlan.204               Full      192.168.204.3       1    61
192.168.204.197   vlan.205               Full      192.168.204.4       1    77


[edit]

Its route is coming from a peer device with ID 172.19.204.3. At this point, I collected all information I needed to move L3 configuration of vlan 30 to QFabric. To achieve that,  I issued the following commands that enabled routed vlan interface in vlan 30.

set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 virtual-address 192.19.30.10
set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 priority 310
set interfaces vlan.30 family inet address 192.168.30.7/24 vrrp-group 30 accept-data
set protocols ospf area 0.0.0.0 interface vlan.30 passive
set vlans v30 l3-interface vlan.30

After commiting new configuration lines above, I checked vrrp status for vlan 30:

root@qfabric# run show vrrp
Interface     State       Group   VR state VR Mode   Timer    Type   Address
vlan.30       up             30   master   Active      A  0.125 lcl    192.168.30.7
                                                                vip    192.168.30.10

It looks good. Then, I could be able to see MAC addresses in this vlan:

root@qfabric# run show arp | match vlan.30
ae:8d:eb:54:a0:c7 192.168.30.44    192.168.30.44              vlan.30             none
a0:26:55:aa:70:c6 192.168.30.46    192.168.30.46              vlan.30             none
a0:1a:4b:f3:66:cc 192.168.30.49    192.168.30.49              vlan.30             none
a4:85:64:ee:23:c6 192.168.30.106   192.168.30.106             vlan.30             none
a8:b5:99:91:ef:ca 192.168.30.111   192.168.30.111             vlan.30             none
a4:85:64:ee:23:ce 192.168.30.115   192.168.30.115             vlan.30             none
a4:85:64:ee:13:c6 192.168.30.116   192.168.30.116             vlan.30             none
ae:4e:7d:0a:d3:c4 192.168.30.120   192.168.30.120             vlan.30             none
a0:26:55:a9:5e:ce 192.168.30.121   192.168.30.121             vlan.30             none
a4:85:64:ee:33:c6 192.168.30.122   192.168.30.122             vlan.30             none
a4:85:64:ee:d4:ce 192.168.30.123   192.168.30.123             vlan.30             none
a4:85:64:ee:13:ce 192.168.30.128   192.168.30.128             vlan.30             none
a4:85:64:ee:a4:ca 192.168.30.129   192.168.30.129             vlan.30             none

Checking the route again to make sure is not being received through OSPF anymore:

root@qfabric# run show route 192.168.30.190

inet.0: 70 destinations, 70 routes (70 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

192.168.30.0/24     *[Direct/0] 00:36:58
                    > via vlan.30

[edit]

Direct route means that server is connected to QFabric. It makes sense again so let's ping:

root@qfabric# run ping 192.168.30.190 source 192.168.46.7
PING 192.168.30.190 (192.168.30.190): 56 data bytes
64 bytes from 192.168.30.190: icmp_seq=0 ttl=255 time=1846 micro seconds
64 bytes from 192.168.30.190: icmp_seq=1 ttl=255 time=1861 micro seconds
64 bytes from 192.168.30.190: icmp_seq=2 ttl=255 time=1718 micro seconds
64 bytes from 192.168.30.190: icmp_seq=3 ttl=255 time=1969 micro seconds
64 bytes from 192.168.30.190: icmp_seq=4 ttl=255 time=1607 micro seconds
[abort]

[edit]
root@qfabric#

As it shows I was able to reach server 192.168.30.190 while issuing ping command locally from QFabric, however, other servers in vlan 46 still couldn't reach it. After some additional troubleshooting I ended up discovering a static route in OSPF peer device with ID 192.168.204.3 (SwitchL3):

SwitchL3#show ip route 192.168.30.190
Routing entry for 192.168.30.190/32
  Known via "static", distance 1, metric 0
  Redistributing via ospf 10, ospf 100
  Routing Descriptor Blocks:
  * 192.168.204.190
      Route metric is 0, traffic share count is 1

The destination IP address 192.168.204.190 is an external firewall, which showed up in the first ping while troubleshooting in the first part of this article. Once that static route was removed, connectivity went back to normal. Other devices in area 0 couldn't reach it because they were receiving specific route for this particular server IP 192.168.30.190 coming from device "SwitchL3" which had a better preference than Qfabric's advertising route for the complete vlan  192.168.30.0/24. Within QFabric, local or direct routes are always prefered and that's why ping worked fine only here.

Sunday, June 22, 2014

Open linux shell in QFabric

I personally like Linux Operating System, such as Debian, Ubuntu, Fedora and others. One of the reasons is the Linux Shell, or originally called Unix Shell, the most common are Bourne shell (sh), Bourne-Again shell (bash) and C shell (csh). Now, could you imagine what a better tool in Qfabric that a linux-type shell with a whole bunch of linux tools in it? Well the good thing is that's possible and in this article I show you how to open it up.

First, log in to network node group and issue start shell command. Once there issue su command and enter root password (that is if you want to get full access to it). Below you can find out how this great tool looks like:

root@qfabric> request component login NW-NG-0
Warning: Permanently added 'dcfnode-default---nw-ine-0,169.254.192.34' (RSA) to the list of known hosts.
Password:
--- JUNOS 13.1X50-D15.1 built 2013-10-31 14:06:44 UTC
{master}
qfabric-admin@NW-NG-0> start shell
% su
Password:
root@NW-NG-0%
root@NW-NG-0% pwd
/var/home/qfabric-admin

You can also log-in to any redundant server node group such as RSNG1 from the following example:

root@qfabric> request component login RSNG1
Warning: Permanently added 'dcfnode-default-rsng1,169.254.193.10' (RSA) to the list of known hosts.
Password:
--- JUNOS 13.1X50-D15.1 built 2013-10-31 14:06:52 UTC
{master}
qfabric-admin@RSNG1> start shell
% su
Password:
root@RSNG1%
root@RSNG1% pwd
/var/home/qfabric-admin

From that point, either you are logged in one or other group, you'll get access to issue linux commands. For instance, I wanted to check system processes with CPU and memory consumption.

% ps -auxw | awk -F' ' '{print $1,$2,$3,$4}' | sed -n -e 1,4p
USER PID %CPU %MEM
root 31 93.9 0.0
root 32 93.9 0.0
root 33 93.9 0.0

I encourage all QFabric Admins to get the habit of using the shell as a troubleshooting tool. You'll never regret.

Monday, June 2, 2014

Configure QFabric with TACACS+ Authentication

AAA (authentication, authorization and accounting) for this particular scenary will be deployed using popular protocol TACACS+ between QFabric and Unix Server.


Common case of AAA

Authentication and Authorization in QFabric Side

set system authentication-order tacplus
set system authentication-order password
set system tacplus-server x.x.x.x port 49
set system tacplus-server x.x.x.x secret "$$$$$"
set system tacplus-server x.x.x.x source-address y.y.y.y
set system tacplus-options service-name junos-exec

Authentication and Authorization in Server Side

group = juniper_users {
          service = junos-exec {
          allow-commands = "(^show+)|(^[/.])|(^ping)|(^telnet)|(^traceroute)|(^quit)|(^monitor)"
          deny-commands = " *"
        }
}

user = juniper170 {
        member = juniper_users

}

User juniper170 already exists in TACACS and belongs to juniper_users group, that is why it will be accepted by TACACS Server:

login as: juniper170
juniper170@y.y.y.y's password:
Last login: Wed May 21 09:40:16 2014 from 172.172.1.2
Juniper QFabric Director 13.1.8347 2013-11-05 04:54:03 UTC

juniper170@qfabric>

Capturing Communication

We do want to make sure communication flow between QFabric y TACACS Server is working properly in both directions. For testing purposes, enable tcpdump on server side and open packet capture with Wireshark.



Encrypted tacplus packet

In order to understand TACACS+ protocol check out a simple trick to decrypt packets because they will be encrypted by default.


Click on Edit and then Preferences:


Search for TACACS+ protocol and enter Encryption Key, which is nothing but your secret key:


Filter only tacplus protocol:

Now you will see Decrypted Request that contains username and so on:



Accounting in QFabric Side

set system accounting destination tacplus server x.x.x.x secret "$$$$$"
set system accounting destination tacplus server x.x.x.x single-connection
set system accounting destination tacplus server x.x.x.x source-address y.y.y.y

Accounting feauture allows the admin to always know what users are doing in remote sessions:

tail -f /var/log/tac.log | grep juniper170


Final Configuration in QFabric Side

After all the work done abow, we would end up with a final configuration that looks like these lines below:

set system authentication-order tacplus
set system authentication-order password
set system tacplus-server x.x.x.x port 49
set system tacplus-server x.x.x.x secret "$$$$$"
set system tacplus-server x.x.x.x source-address y.y.y.y
set system tacplus-options service-name junos-exec
set system accounting destination tacplus server x.x.x.x secret "$$$$$"
set system accounting destination tacplus server x.x.x.x single-connection
set system accounting destination tacplus server x.x.x.x source-address y.y.y.y

Tuesday, May 13, 2014

Simple port mirroring in QFabric


Sometimes is required to check what's exactly happening in each service interface of QFabric and here is when port mirroring comes into play.

Only a few lines can build up this functionality:
set ethernet-switching-options analyzer pmtest input ingress interface nod14:ge-0/0/20.0
set ethernet-switching-options analyzer pmtest input egress interface nod14:ge-0/0/20.0
set ethernet-switching-options analyzer pmtest output interface nod14:ge-0/0/22.0

Once is commited, connect your packet capture, such as a laptop with wireshark installed in it, to the desire port and check in Qfabric that port mirroring is working fine:

root@qfabric# run show analyzer
Analyzer name                    : pmtest
  Output interface               : nod14:ge-0/0/22.0
  Ingress monitored interfaces   : nod14:ge-0/0/20.0
  Egress monitored interfaces    : nod14:ge-0/0/20.0

[edit]
root@qfabric#

Friday, April 11, 2014

Correlate interface names to the right node in Network Node Group



It happens that QFabric log shows the interface name with the FPC prefix. This could be very tricky while troubleshooting. 

For instance, you can get a log line like the next one:

root@qfabric> show log messages | last 100000 | match SNMP_TRAP_LINK_DOWN
Apr 10 16:25:05 qfabric NW-NG-0 mib2d[8296]: SNMP_TRAP_LINK_DOWN: ifIndex 1091568306, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-5/0/18

I think the best way to find out the FPC number of each QFX3500 node is the combination of two commands as it follows. First, log in to Network Node Group using request component login.

root@qfabric> request component login NW-NG-0
Warning: Permanently added 'dcfnode-default---nw-ine-0,169.254.192.34' (RSA) to the list of known hosts.
Password:
--- JUNOS 13.1X50-D15.1 built 2013-10-31 14:06:44 UTC
{master}

qfabric-admin@NW-NG-0>

At this point, issue virtual chassis status:

qfabric-admin@NW-NG-0> show virtual-chassis status

Preprovisioned Virtual Chassis
Virtual Chassis ID: 0000.0022.0000
                              Mstr
Member ID  Status   Model     prio  Role     Serial No
0 (FPC 0)  Prsnt    qfx3500   0     Linecard R0000-C
1 (FPC 1)  Prsnt    qfx3500   0     Linecard R1111-C
2 (FPC 2)  Prsnt    qfx3500   0     Linecard R2222-C
3 (FPC 3)  Prsnt    qfx3500   0     Linecard R3333-C
4 (FPC 4)  Prsnt    qfx3500   0     Linecard R4444-C
5 (FPC 5)  Prsnt    qfx3500   0     Linecard R5555-C
6 (FPC 6)  Prsnt    qfx3500   0     Linecard R6666-C
7 (FPC 7)  Prsnt    qfx3500   0     Linecard R7777-C
8 (FPC 8)  Prsnt    fx-jvre   128   Backup   aaaaaaaa-4797-11e3-98ac-111111111111
9 (FPC 9)  Prsnt    fx-jvre   128   Master*  bbbbbbbb-4797-11e3-99a8-222222222222

{master}
qfabric-admin@NW-NG-0>

From the output abow, it shows FPC of each QFX3500 node device and the next step is to find out which serial number belongs to which node. This job can be solved showing the fabric administration inventory:

root@qfabric> show fabric administration inventory
Item                    Identifier              Connection      Configuration
Node group
  NW-NG-0                                       Connected       Configured
    nod14               R6666-C                 Connected
    nod15               R3333-C                 Connected
    nod16               R1111-C                 Connected
    nod09               R5555-C                 Connected
    nod10               R2222-C                 Connected
    nod11               R7777-C                 Connected
    nod12               R1111-C                 Connected
    nod13               R4444-C                 Connected
  RSNG1                                         Connected       Configured
    nod01               R0677-C                 Connected
    nod02               W0137-C                 Connected
  RSNG2                                         Connected       Configured
    nod03               R2292-C                 Connected
    nod04               R0603-C                 Connected
  RSNG3                                         Connected       Configured
    nod05               R0818-C                 Connected
    nod06               R4700-C                 Connected
  RSNG4                                         Connected       Configured
    nod07               R2215-C                 Connected
    nod08               R4914-C                 Connected
Interconnect device
  interconnect1         IC-R1111-C              Connected       Configured
    R2692-C/RE0                                 Connected
  interconnect2         IC-W2222-C              Connected       Configured
    W2300-C/RE0                                 Connected
Fabric manager
  FM-0                                          Connected       Configured
Fabric control
  FC-0                                          Connected       Configured
  FC-1                                          Connected       Configured
Diagnostic routing engine
  DRE-0                                         Connected       Configured
Director group
  DG0                   0000000450000000        Connected
  DG1                   0111111451111111        Connected

root@qfabric> 

In conclusion, for our example ge-5/0/18 matches node 9 port 18.

Tuesday, April 8, 2014

Checking SFP modules



Wondering how to find out remotely what SFP module is connected to any port in your QFX3500 node? then you need the following comand:

root@qfabric> show chassis hardware node-device <node_name>

Example:

Connect to QFabric's virtual IP address through SSH and from operational mode issue the command:

root@qfabric> show chassis hardware node-device nod01
Routing Engine 1          BUILTIN      BUILTIN           QFX Routing Engine
nod01            REV 04   750-999999   R9999-C           QFX3500-48S4Q
  CPU                     BUILTIN      BUILTIN           FPC CPU
  PIC 0                   BUILTIN      BUILTIN           48x 10G-SFP+
    Xcvr 0       REV 01   740-021309   APQ3DLL           SFP+-10G-LR
    Xcvr 2       REV 01   740-021309   APQ3DM4           SFP+-10G-LR
    Xcvr 6       REV 02   740-013111   D250924           SFP-T
    Xcvr 7       REV 02   740-011613   PPM20MK           SFP-SX

It only shows ports that have a transceiver (XCVR) connected in it.

If there is any port that doesn't have a SFP, it just won't show up. You can double check with:

root@qfabric# run show interfaces terse | match nod01
nod01:xe-0/0/0          up    down
nod01:xe-0/0/2          up    down
nod01:ge-0/0/6          up    up
nod01:ge-0/0/6.0        up    up   eth-switch
nod01:ge-0/0/7          up    down
nod01:ge-0/0/7.0        up    down eth-switch

Matching both commands, you can say that nod01:xe-0/0/0 has a transceiver type of SFP+-10G-LR, which is a 10 Gbit/s Ethernet over monomode fiber.