seth
seth

Reputation: 139

Faster WinSock sendto()

I'm using Windows Server 2008, and my program is in C++. I'm using WinSock2 and sendto() in a while(true) loop to send my packets.

Code like so:

while(true)
{
    if(c == snd->max)
        c = snd->min;

    dest.sin_addr.S_un.S_addr = hosts[c];
    iphead->destaddr = hosts[c];

    sendto(s, castpacket, pktsz, 0, castdest, szsad);

    ++c;
}

I need to send as much data to as many IPs in my hosts std::vector as possible, as quickly as possible.

I'm currently running on an i7 930 server, and I can only achieve 350Mbps or so.

I currently split my program into 4 threads, all running the while loop with different servers assigned to each thread. Adding more threads or running more copies of the program results in lower throughput.

I have another program running listening for replies from the servers. I get the servers from a master list and add them to my array. The problem at the moment is that it takes too long to go through all of them, and I want to check them regularly.

How exactly can I optimize my program/loop/sending here?

Upvotes: 4

Views: 3332

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 596352

Have a look at WinSock's Registered I/O Extensions (RIO) API:

The RIO API is a new extension to Windows Sockets (Winsock) and provides an opportunity for you to reduce network latency, increase message rates and improve the predictability of response times for applications that require very high performance, very high message rates, and predictability. RIO API extensions allow applications that process large numbers of small messages to achieve higher I/O operations per second (IOPS) with reduced jitter and latency. Server loads with high message rates and low latency requirements benefit most from RIO API extensions, including applications for financial services trading and high speed market data reception and dissemination. In addition, RIO API extensions provide high IOPS when you deploy many Hyper-V virtual machines (VMs) on a single physical computer.

RIO enables send and receive operations to be performed with pre-registered buffers using queues for requests and completions. Send and receive operations are queued to a request queue that is associated with a Winsock socket. Completed I/O operations are inserted into a completion queue, and many different sockets can be associated with the same completion queue. Completion queues can also be split between send and receive completions. Completion operations, such as polling, can be performed entirely in user-mode and without making system calls.

The use of registered buffers streamlines the network related processing, reduces jitter, and additionally makes it possible for application developers to specify the NUMA node affinity of networking buffers used by the protocol stack — further enhancing overall performance, and reducing latency and jitter characteristics.

RIO API extensions support Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and multicast UDP, as well as both IPv4 and IPv6.

You can use RIO API extensions if you want to achieve any of the following:

  • Scale up your server to minimize CPU utilization per message

  • Reduce the latency contribution and jitter of the networking stack to a minimum

  • Handle very high rates of multicast or UDP traffic

Use of the RIO API extensions have the following additional benefits:

  • RIO works on all editions of Windows Server 2012.

  • RIO is compatible with normal network adapters and does not require special network adapters or RDMA.

  • RIO is fully compatible with existing Windows networking features, including RSS, RSC, network interface card teaming, and static offloads.

  • RIO works with virtualization when you deploy Hyper-V in Windows Server 2012.

  • RIO sockets use the standard Windows networking stack and standard TCP/IP and UDP protocols.

Upvotes: 3

vhallac
vhallac

Reputation: 13907

I would recommend moving to asynchronous I/O to speed things up a bit here. The main problem with sending them one at a time is that you are unable to queue up the next packet while the TCP stack is processing the current one.

Alterntively, you can go for a thread pool approach: you fire a certain number of worker threads, and each one picks up a client from a FIFO and sends data to the client. When a thread is done with its client, it puts the client back in the FIFO and picks up a new one. You can fill up the pipeline - but not swamp it - by tuning the number of worker threads.

Upvotes: 2

Related Questions