Ph0t0n
Ph0t0n

Reputation: 996

Efficiency in sending UDP packets to the same address

I am reworking some of the infrastructure in an existing application that sends UDP data packets to 1...N addresses (often multicast). Currently there are, let's say, T transmitter objects, and in some cases, all of the transmitters are sending to the the same address.

So to simplify and provide an example case, lets say there are 3 transmitter objects and they all need to send to a single specific address. My question is... which is more efficient?:

Option 1) Put a mutex around a single socket and have all the transmitters (T) share the same socket.

T----\
T----->Socket
T----/

Option 2) Use three separate sockets, all sending to the same location.

T----->Socket 1
T----->Socket 2
T----->Socket 3

I suspect that with the second option, under the hood, the OS or the NIC puts a mutex around the final transmit so in the big picture, Option 2 is probably not a whole lot different than Option 1.

I will probably set up an experiment on my development PC next week, but there's no way I can test all the potential computer configurations that users might install on. I also realize there are different implementations - Windows vs Linux, different NIC chipset manufacturers, etc, but I'm wondering if anyone might have some past experience or architectural knowledge that could shed light on an advantage of one option over the other.

Thanks!

Upvotes: 0

Views: 265

Answers (1)

Ph0t0n
Ph0t0n

Reputation: 996

After running some benchmarks on a Windows 10 computer, I have an "answer" that at least gives me a rough idea of what to expect. I can't be 100% sure that every system will behave the same way, but most of the servers I run use Intel NICs and Windows 10, and my typical packet sizes are around 1200 bytes, so the answer at least makes me comfortable that it's correct for my particular scenario. I decided to post the results here in case it might help anyone else can make use of the experiment.

I build a simple command line app that would first spawn T transmitter threads all using a single socket with a mutex around it. Immediately after, it would run another test with the same number of transmitters, but this time each transmitter would have its own socket so no mutex was needed, (although I'm sure at some lower level there was a locking mechanism). Each transmitter blasts out packets as fast as possible.

This is the test setup I used:

  • 2,700,000 packets at 1200 bytes each.
  • Release mode, 64 bit.
  • i7-3930K CPU, Intel Gigabit CT PCIE adapter.

And here are the results

  • 1 Transmitter : SharedSocket = 28.2650 sec : 1 Socket = 28.2073 sec.
  • 3 Transmitters : SharedSocket = 28.4485 sec : MultipleSockets = 27.5190 sec.
  • 6 Transmitters : SharedSocket = 28.7414 sec : MultipleSockets = 27.3485 sec.
  • 12 Transmitters : SharedSocket = 27.9463 sec : MulitpleSockets = 27.3479 sec.

As expected, the test with only one thread had almost the same time for both. However, in the cases with 3, 6, and 12 transmitters, there is an approximately 3% better performance boost by using one socket per thread instead of sharing the socket. It's not a massive difference, but if you're trying to squeeze every last ounce out of your system, it could be a useful statistic. My particular application is for transmitting a massive amount of video.

Just as a sanity check.... here is a screenshot of the TaskManager's network page on the server side. You can see a throughput increase about half way through the test, which coincides with the switch to the second multiple socket test. I included a screencap of the client computer as well (it was a Windows 7 box).

Screenshot

Upvotes: 1

Related Questions