Google

Monday, January 4, 2010

Layer by Layer Troubleshooting with a Cisco Router

Every network admin is going to have trouble with network links on a Cisco router, at one point or another. The best way to troubleshoot any networking issues is to use the OSI model and go layer by layer. In my article How to use the OSI Model to Troubleshoot Networks, we talked about the different troubleshooting approaches and how to use them to troubleshoot your network, in general. In this article, you will find out how to use the OSI model to troubleshoot, bottom up, using a Cisco router.

OSI Model - Bottom Up Troubleshooting

If you will recall, the OSI model starts with the physical layer (layer 1) and goes up to layer 7 (application). When troubleshooting with a Cisco router, much of your time will be spent working in layers 1-3. They are:

  • Layer 3 - Network
  • Layer 2 - Data Link
  • Layer 1 - Physical

Because these layers build on each other, Layer 1 is most critical, without layer 1, layer 2 will not function. Without layer 1 & 2, layer 3 will not function, and so on. For this reason, I start troubleshooting at layer 1, physical, and move on up from there.

Router Troubleshooting at OSI Layer 1 & 2 - Physical & Data link

Remember, if Layer 1 isn't up, nothing else will work so make sure you start here. Examples of layer 1 are your T1 circuit or your Ethernet cable - physical connectivity. I usually troubleshoot layer 1 and layer 2 in union because they are so closely paired. Examples of layer 2 - data link - are your line protocol (such as Ethernet, ATM, 802.11, PPP, frame-relay, HDLC, or PPP).

To troubleshoot at these layers, the first thing I would do on your router is a show interface. Here is an example of a LAN Gigabit Ethernet circuit:


To troubleshoot at these layers, the first thing I would do on your router is a show interface. Here is an example of a LAN Gigabit Ethernet circuit:

Router# show interface
GigabitEthernet0/0 is up, line protocol is up
Hardware is BCM1125 Internal MAC, address is 0015.2b46.5000 (bia 0015.2b46.5000)
Description: LAN Connection to Data center
Internet address is 10.20.100.1/16
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is autonegotiation, media type is RJ45
output flow-control is XON, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: weighted fair
Output queue: 0/1000/64/0 (size/max total/threshold/drops)
Conversations 0/2/256 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
Available Bandwidth 750000 kilobits/sec
5 minute input rate 3218000 bits/sec, 1715 packets/sec
5 minute output rate 1390000 bits/sec, 2129 packets/sec
1416888620 packets input, 15402720 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1556005 multicast, 0 pause input
0 input packets with dribble condition detected
1666663097 packets output, 573841802 bytes, 0 underruns
19 output errors, 0 collisions, 3 interface resets
0 babbles, 0 late collision, 0 deferred
19 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Here is what a WAN T1or T3 circuit might look like:

Routerl# show interface serial 3/0
Serial3/0 is up, line protocol is up
Hardware is DSXPNM Serial
Description: Sprint T3
Internet address is 10.2.100.2/30
MTU 4470 bytes, BW 9000 Kbit, DLY 200 usec,
reliability 255/255, txload 77/255, rxload 26/255
Encapsulation HDLC, crc 16, loopback not set
Keepalive set (10 sec)
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 18394
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 927000 bits/sec, 1914 packets/sec
5 minute output rate 2752000 bits/sec, 1504 packets/sec
1560997932 packets input, 3254680247 bytes, 0 no buffer
Received 255480 broadcasts, 1 runts, 1 giants, 0 throttles
1567 input errors, 1567 CRC, 976 frame, 496 overrun, 0 ignored, 908 abort
1303636803 packets output, 3737276508 bytes, 0 underruns
0 output errors, 0 collisions, 3 interface resets
0 output buffer failures, 0 output buffers swapped out
1 carrier transitions
DSU mode 1, bandwidth 9000, real bandwidth 9000, scramble 0

Here is the quick version:

Router# show ip interface brief
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 10.20.100.1 YES NVRAM up up
Serial3/0 10.2.100.2 YES NVRAM up up

Here is what you look for:

  • Is the interface UP?
  • Is the line protocol UP?
  • If both the interface and line protocol are NOT up, your connection is never going to work.
  • To resolve a line down, I look at the cable or the keepalives
  • To resolve a line protocol down, check to make sure that the protocols match on each side of the connection(notice the "line protocol" on each of the interfaces above).
  • Are you taking input, CRC, framing, or other errors on the line (notice how the serial interface above does show errors)? If so, check your cable or contact your provider.

In general, verify that you have a good cable on each side, verify that line protocols match, and that clocking settings are correct.

If this is an Ethernet connection, is there a link light on the switch?

If this is a serial connection, do you have an external CSU/DSU? If it is an external CSU, check that the Carrier Detect (CD) light & data terminal ready (DTR) lights are on. If not, contact your provider. This also applies if you have an internal Cisco WIC CSU card. If that is the case, take a look at this Cisco link on understanding the lights on that card.

You can, of course, use the Cisco IOS test commands to test your network interfaces with internal staff and with your telecommunications providers.

Do not proceed to upper level layers until your Physical interface on the router shows as being UP and your line protocol is UP. Until then, don't worry about IP addressing, pinging, access-lists or anything like that.

Router Troubleshooting at OSI Layer 3 - Network

Once you have Layers 1 & 2 working (your show interface command shows the line is "UP & UP", it is time to move on to layer 3 - the OSI Network layer. The easiest thing to do here to see if layer 3 is working is to ping the remote side of the LAN or WAN link from this router. Make sure you ping as close as possible to the router you are trying to communication with - from one side across to the other side.

Here are examples of successful & failed pings:

Router# ping 10.2.100.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.2.100.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
Router#
Router#
Router#
Router#
Router# ping 1.1.1.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Router#

The easiest way to check the status of Layer 3 - the network layer - is to do a show ip interface brief, as I did above. Here is an example:

Router# show ip interface brief
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 10.20.100.1 YES NVRAM up up
Serial3/0 10.2.100.2 YES NVRAM up up

Notice the IP addressing on each of these interface. Also do a show running-config, like this (you can even specify an interface, like this):

Router# show running-config int serial3/0
Building configuration...

Current configuration : 225 bytes
!
interface Serial3/0
description Sprint T3
bandwidth 9000
ip address 10.2.100.2 255.255.255.252
no ip proxy-arp
no ip mroute-cache
dsu mode 1
dsu bandwidth 9000
no cdp enable
end

Router#

I would recommend taking this interface configuration and comparing it, side by side, with the remote WAN connection to ensure they are the same. Ask yourself questions like:

  • Are these interfaces on the same IP network?
  • Do these interfaces have the same subnet mask?
  • Are there any access-lists (ACL) that are blocking your traffic?
  • Can you remove all optional IP features to make sure that the basic configuration works before adding additional features that could be causing trouble?

Here is an example. Look at the two interfaces below. What is the real problem, causing these two to not communicate?

Router 1

interface Serial3/0 description Sprint T3 - TO ROUTER 2 bandwidth 9000 ip address 10.2.100.2 255.255.255.252

Router 2

interface Serial3/0 description Sprint T3 - TO ROUTER 1 bandwidth 1500 ip address 10.2.100.5 255.255.255.252

No, there is no problem with the bandwidth statement. Bandwidth statements are only used as comments and by routing protocols to select the best route. The real problem here is that the second router's serial interface is not on the same IP subnet as router #1. Even though they have the same subnet, the 10.2.100.5 IP address will never be able to communicate to the 10.2.100.2 IP address because they are on different networks but directly connected.

Let's say that you are now able to ping across the link, from one side to another. While that is a great sign, it doesn't always mean that everything is "fixed". You still may not be able to communicate from a client on the LAN of one router, to a client on the LAN of another router, due to things like improperly configured IP routing protocols.

For one LAN to communicate to another LAN, through routers (through a WAN, usually), you MUST have either static routes or dynamic routes configured. To ensure you have a route configured for the network you are trying to reach, do:

Router# show ip routes

and look at

Router# show ip protocols

For troubleshooting layers 3, all the way up, look at the output of this command:

Router# show ip interfaces

GigabitEthernet0/0 is up, line protocol is up
Internet address is 10.20.100.1/16
Broadcast address is 255.255.255.255
Address determined by non-volatile memory
MTU is 1500 bytes
Helper address is not set
Directed broadcast forwarding is disabled
Multicast reserved groups joined: 224.0.0.10
Outgoing access list is not set
Inbound access list is not set
Proxy ARP is disabled
Local Proxy ARP is disabled
Security level is default
Split horizon is enabled
ICMP redirects are always sent
ICMP unreachables are always sent
ICMP mask replies are never sent
IP fast switching is enabled
IP fast switching on the same interface is disabled
IP Flow switching is enabled
IP CEF switching is enabled
IP CEF Flow Fast switching turbo vector
IP multicast fast switching is disabled
IP multicast distributed fast switching is disabled
IP route-cache flags are Fast, Flow cache, CEF, Subint Flow
Router Discovery is disabled
IP output packet accounting is disabled
IP access violation accounting is disabled
TCP/IP header compression is disabled
RTP/IP header compression is disabled
Policy routing is disabled
Network address translation is enabled, interface in domain inside
WCCP Redirect outbound is disabled
WCCP Redirect inbound is disabled
WCCP Redirect exclude is disabled
BGP Policy Mapping is disabled

Router Troubleshooting at OSI Layers 4 - 7

Now, let's say that you have made it to the point where you can ping from LAN to LAN, through your WAN. Congratulations - that is a very good sign. If you are still having trouble, it must be in OSI Layers4-7. Here are those layers listed out and possible issues you might experience in each layer:

  • Layer 4 - Transport - in the transport layer are TCP and UDP - you could be have an ACL or QoS feature blocking or slowing this traffic. Your TCP traffic could also be fragmented to the point that it could not be reassembled. Another option is that you may not be receiving an ACK back from your traffic that was successfully sent.
  • Layer 5 - Session - in the session layer are protocols like SQL, NFS, SMB, or RPC - you could be taking errors on any one of these session protocols. I would recommend using a protocol analyzer like Wireshark to analyze your session data.
  • Layer 6 - Presentation - in the Presentation layer are data encryption, compression, and formatting - your VPN tunnel could be failing or perhaps you are sending one type of data (like a MPEG) and the receiver is trying to view it as a WMV file.
  • Layer 7 - Application - in the Application layer are, of course, your applications like FTP, HTTP, SCP, TFTP, TELNET, SSH, and more - you could be trying to connect to a telnet server with the SSH protocol, for example.
  • Layer 8 - End User - the standing joke is that "Layer 8" is the user - the user could be just mistyping their username or password or you, the network admin, could have been troubleshooting the wrong IP address all along.

Summary

In summary, using the OSI model to troubleshoot connectivity issues is the fastest and most efficient way to troubleshoot any network issue. Even if someone calls you to work on a Windows share problem, all of the same principles in this article apply to that troublesooting process. So remember, the next time you work on a network issue - remember the OSI model and how to use the bottom-up approach to troubleshooting! It could same you a while lot of time!