Sayantan Ghosh
Sayantan Ghosh

Reputation: 1056

UDP server consuming high CPU

I am observing high CPU usage in my UDP server implementation which runs an infinite loop expecting 15 1.5KB packets every milliseconds. It looks like below:

struct RecvContext
{

    enum { BufferSize = 1600 };

    RecvContext() 
    { 
        senderSockAddrLen = sizeof(sockaddr_storage);
        memset(&overlapped, 0, sizeof(OVERLAPPED));
        overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
        memset(&sendersSockAddr, 0, sizeof(sockaddr_storage));
        buffer.clear();
        buffer.resize(BufferSize);
        wsabuf.buf = (char*)buffer.data();
        wsabuf.len = ULONG(buffer.size());
    }

    void CloseEventHandle()
    {
        if (overlapped.hEvent != INVALID_HANDLE_VALUE)
        {
            CloseHandle(overlapped.hEvent);
            overlapped.hEvent = INVALID_HANDLE_VALUE;
        }
    }

    OVERLAPPED overlapped;
    int senderSockAddrLen;
    sockaddr_storage sendersSockAddr;
    std::vector<uint8_t> buffer;
    WSABUF wsabuf;
};

void Receive()
{
    DWORD flags = 0, bytesRecv = 0;

    SOCKET sockHandle =...;

    while (//stopping condition//)
    {

        std::shared_ptr<RecvContext> _recvContext = std::make_shared<IO::RecvContext>();

        if (SOCKET_ERROR == WSARecvFrom(sockHandle, &_recvContext->wsabuf, 1, nullptr, &flags, (sockaddr*)&_recvContext->sendersSockAddr,
            (LPINT)&_recvContext->senderSockAddrLen, &_recvContext->overlapped, nullptr))
        {
            if (WSAGetLastError() != WSA_IO_PENDING)
            {
                //error
            }
            else
            {
                if (WSA_WAIT_FAILED == WSAWaitForMultipleEvents(1, &_recvContext->overlapped.hEvent, FALSE, INFINITE, FALSE))
                {
                    //error
                }

                if (!WSAGetOverlappedResult(sockHandle, &_recvContext->overlapped, &bytesRecv, FALSE, &flags))
                {
                    //error
                }
            }
        }

        _recvContext->CloseEventHandle();
        // async task to process _recvContext->buffer
    }
}

The cpu consumption for this udp server is very high even when the packets are not being processed post receipt. How can the cpu consumption be improved here?

Upvotes: 0

Views: 563

Answers (1)

David Schwartz
David Schwartz

Reputation: 182827

You've chosen about the most inefficient combination of mechanisms imaginable.

  1. Why use overlapped I/O if you're only going to pend one operation and then wait for it complete?

  2. Why use an event, which is about the slowest notification scheme that Windows has.

  3. Why do you only pend one operation at a time? You're forcing the implementation to stash datagrams in its own buffers and then copy them into yours.

  4. Why do you post the receive operation right before you're going to wait for it to complete rather than right after the previous one completes?

  5. Why do you create a new receive context each time instead of re-using the existing buffer, event, and so on?

Use IOCP. Windows events are very slow and heavy.

Post lots of operations. You want the operating system to be able to put the datagram right in your buffer rather than having to allocate another buffer that it copies data into and out of.

Re-use your buffers and allocate all your receive buffers from a contiguous pool rather than fragmenting them throughout process memory. The memory used for your buffers has to be pinned and you want to minimize the amount of pinning needed.

Re-post operations as soon as they complete. Don't process them and then re-post. There's no reason to delay starting the operation. You can probably ignore this if you followed all the other suggestions because you wouldn't have a "spare" buffer to post anyway.

Alternatively, you can probably get away with having a thread that spins on a blocking receive operation. Just make sure your code has a loop that is as tight as possible, posting a different (already-allocated) buffer as soon as it returns after dispatching another thread to process the buffer it just filled with the receive operation.

Upvotes: 2

Related Questions