gjain
gjain

Reputation: 4518

Lossy network versus Congested network

Suppose, there is a network which gives a lot of Timeout errors when packets are transmitted over it. Now, timeouts can happen either because the network itself is inherently lossy (say, poor hardware) or it might be that the network is highly congested, due to which network devices are losing packets in between, leading to Timeouts. Now, what additional statistics about the traffic being transmitted (like Missing Packets errors etc.) are required that might help us to find out whether timeouts are happening due to poor hardware, or too much network load. Please note that we have access only to one node in the network (from which we are transmitting packets) and as such, we cannot get to know the load being put by other nodes on the network. Similarly, we don't really have any information about the hardware being used in the network. Statistics is all that we have.

Upvotes: 2

Views: 3469

Answers (2)

Seth Noble
Seth Noble

Reputation: 3303

In practice, nearly all loss on terrestrial network paths is due to either congestion or firewalls. Loss due to bit-errors is extremely rare. Even on wireless networks, forward error correction handles most bit/media/transmission errors. Congestion can be caused by a lot of different factors: any given network path will involve dozens of devices and if any one of them becomes overloaded for even a moment, packets will be dropped.

The only way to tell the difference between congestion induced packet loss and media errors is that media errors will occur independent of load. In other words, the loss rate will be the same whether you are sending a lot of data or only a little data.

To test that, you will need some control, or at least knowledge, of the load on the path. Since you don't have control and the only knowledge you have is from source-node observation, the best you can do is to take test samples (using ping is the easiest) around the clock and throughout the week, recording loss rates and latencies. These should give you an idea of when the path is relatively idle. If loss rates remain significant even when the path is (probably) idle, then there might be a media-loss issue. But again, that is extremely rare.

For background, I have written a few articles on the subject:

Upvotes: 0

blankabout
blankabout

Reputation: 2637

A network node only has hardware information about its local collision domain, which on a standard network will be the cable that links the host to the switch.

All the TCP stack will know about lost packets is that it is not receiving acknowledgements so it needs to resend, there is no mechanism for devices (E.g. switches & routers) between a source and destination to tell the source that there is a problem.

Without access to any other nodes the only way to ascertain if your problem is load based would be to run a test that sends consistent traffic over the network for a long period, if the packet retry count per second/minute/hour remains the same then it would suggest that there is a hardware issue, if the losses only occur during peak traffic periods then the issue could be load related. Of course there could be a situation where misconfigured hardware issues will only be apparent during high traffic periods, this takes things back to the main problem which is that you need access to network stats from beyond your single node.

Upvotes: 1

Related Questions