Reputation: 99
So, I am working on a C++ application that currently uses C sockets to transfer data between peers. There are n peers and all run the same code. In the application logic, any peer can need to transfer (possibly large) data to any other peer and so connections are first open between all possible combinations of peers. The requirement is that the application logic and the network transfers of (possibly large) data should be as fast as possible.
As of present, between any 2 peers (say A and B), the application opens 2 types of connections - one where A is the server and B is the client and vice versa. This was possibly done so that if A needs to transfer data to B and vice versa concurrently, the whole thing can finish faster than just having one connection type from A to B. For each connection type (say where A is the server and B the client), the application then opens 3 TCP connections (using C-sockets). However, the way its presently coded it only ends up using only one of these 3 connections.
Upon seeing this, I began to wonder that to make optimal use of N open connections, maybe one can use round-robin or some policy to break data in chunks and tranfer at same time. However, the question of how many parallel TCP connections should be open and what policy be used between these connections is not clear to me. On what factors does this answer also depend ? For example, if i have 1000 TCP connections open, whats the harm ? (ignoring the system constraints like running out of ports etc.)
If someone can throw light on how applications today make use of multiple parallel TCP connections to be most performant, that would be great. Quick google search leads me to several research papers, but I am also interested in knowing how do for example web browsers solve this problem.
Thanks!
UPDATE : After talking to a few people with more knowledge of TCP, I have come to have a better picture. Firstly, my premise that opening two types of connections between A and B (one where A is client and B server and vice versa) will help in increasing net throughput seems wrong. Opening one type of TCP connection between A and B should suffice. This depends on whether datagrams are able to travel from A to B and vice versa at the same time. I found this link to be useful : Is TCP bidirectional or full-duplex?.
Also, to make use of full bandwidth available to me, its better to open multiple TCP connections. I found this highly relevant link : TCP is it possible to achieve higher transfer rate with multiple connections?
But the question of how many such connections should be open still remains. It would be great if someone can answer that.
Upvotes: 5
Views: 2311
Reputation: 73304
When transferring data between two hosts, there is unlikely to be any significant throughput advantage to be obtained by using more than one TCP socket. With proper programming, a single TCP connection can saturate the link's bandwidth in both directions simultaneously (i.e. it can do full-duplex/2-way transfers at line speed). Splitting the data across multiple TCP connections merely adds overhead; in the best-case scenario, each of the N connections will transfer at 1/N the speed of the single connection (and in real life, less than that, due to additional packet headers, bandwidth contention, etc).
There is one potential (minor) benefit that can be realized by using multiple TCP streams, however -- that benefit is seen only in the case where the data being transferred in stream A is logically independent of the data being transferred in stream B. If that is the case (i.e. if the receiver can immediately make use of data in stream A, without having to wait for data in stream B to arrive first), then having multiple streams can make your data transfer somewhat more resilient to packet-dropouts.
For example, if stream A drops a packet, that will cause stream A to have to briefly pause while it retransmits the dropped packet, but in the meantime stream B's data may continue to flow without interruption, since stream B is operating independently from stream A. (If the A-data and the B-data were both being sent over the same TCP stream, OTOH, the B-data would be forced to wait for the lost A-packet to be retransmitted, since strict FIFO-ordering is always enforced within a TCP stream).
Note that this benefit is likely smaller than you might think, though, since in many cases the problem that caused one TCP stream to lose packets will also simultaneously cause any other TCP streams going over the same network path to lose packets too.
Upvotes: 2
Reputation: 891
You didn't specify OS, so I will assume it's Linux we're talking about. I think you need to do some research about non-blocking IO, say epoll or asio. It is currently the most effective and scalable way to work with multiple connections simultaneously.
You can start here, for example.
Some performance analysis can be found here or here.
Upvotes: 1