Reputation: 2563
I have a bunch of data coming in to my server via UDP socket. These data need to be processed and get sent to our clients. Because the incoming data is large, I am thinking maybe I should optimize this process using DMA transfer - I want the network data to be stored directly to RAM so that user space applications can access them quickly.
I have limited experience with device drivers and I don't know much about network sockets. My questions are :
Can DMA transfer be used in this scenario
Would it actually reduce the overhead and improve the performance?
Upvotes: 1
Views: 937
Reputation:
Nowadays most network adapters use DMA to deliver packets to host memory on receive and grab packet data from it when transmitting. But in regular kernel drivers, the DMA memory buffers aren't accessible from applications. The data is copied between kernel-side and user-side buffers.
If the requirement in question, "user space applications can access them quickly", implies avoding such data copies between kernel-side and user-side buffers, then one should consider using so-called kernel-bypass techniques, for example, PACKET_MMAP
API of Linux kernel or DPDK
.
In a nutshell, PACKET_MMAP
allows an application to set up a memory buffer shared between the kernel- and userspace and access packet data directly. This is what the workflows look like:
[setup] socket() -------> creation of the capture socket
setsockopt() ---> allocation of the circular buffer (ring)
option: PACKET_RX_RING
mmap() ---------> mapping of the allocated buffer to the
user process
[capture] poll() ---------> to wait for incoming packets
[shutdown] close() --------> destruction of the capture socket and
deallocation of all associated
resources.
[setup] socket() -------> creation of the transmission socket
setsockopt() ---> allocation of the circular buffer (ring)
option: PACKET_TX_RING
bind() ---------> bind transmission socket with a network interface
mmap() ---------> mapping of the allocated buffer to the
user process
[transmission] poll() ---------> wait for free packets (optional)
send() ---------> send all packets that are set as ready in
the ring
The flag MSG_DONTWAIT can be used to return
before end of transfer.
[shutdown] close() --------> destruction of the transmission socket and
deallocation of all associated resources.
Packets are handled in batches (blocks), which avoids costly kernel-to-userspace context switches.
One can learn more about PACKET_MMAP
from the documentation (link above). Furthermore, this mechanism is already exercised by libpcap
-based tools (tcpdump
, wireshark
and the likes).
Upvotes: 1