Reputation: 51
I have been reading about performance tuning of Linux to get the fastest packet processing times when receiving financial market data. I see that when the NIC receives a packet, it puts it in memory via DMA, then raises a HardIRQ - which in turn sets some NAPI settings and raises a SoftIRQ. The SoftIRQ then uses NAPI/device drivers to read data from the RX Buffers via polling, but this is only run for some limited time (net.core.netdev_budget, defaulted to 300 packets). These are in reference to a real server running ubuntu, with a solarflare NIC My questions are below:
If each HardIRQ raises a SoftIRQ, and the Device Driver reads multiple packets in 1 go (netdev_budget), what happens to the SoftIRQs raised by each of the packets that were drained from the RX buffer in 1 go (Each pack received will raise a hard and then soft irq)? Are these queued?
Why does the NAPI use polling to drain the RX_buffer? The system has just generated a SoftIRQ and is reading the RX buffer, then why the polling?
Presumably, draining of the RX_Buffer via the softirq, will only happen from 1 specific RX_Buffer and not across multiple RX_Buffers? If so, then increasing the netdev_budget can delay the processing/draining of other RX_buffers? Or can this be mitigated by assigning different RX_buffers to different cores?
There are settings to ensure that HardIRQs are immediately raised and handled. However, SoftIRQs may be processed at a later time. Are there settings/configs to ensure that SoftIRQs related to network RX are also handled at top priority and without delays?
Upvotes: 4
Views: 6068
Reputation: 1656
I wrote a comprehensive blog post explaining the answers to your questions and everything else about tuning, optimizing, profiling, and understanding the entire Linux networking stack here.
Answers to your questions:
sofirqs raised by the driver while a softirq is processing do nothing. This is because the NAPI helper code first checks to see if NAPI is already running before attempting to raise the softirq. Even if the NAPI did not check, you can see from the softirq source that softirqs are implemented as a bit vector. This means a softirq can only be 1
(pending) or 0
(not pending). While it is set to 1
, additional calls to set it to 1
will have no effect.
The softirq is used to start the NAPI poll loop and to control the NAPI poll so it does not eat 100% of CPU usage. The NAPI poll loop is just a for loop, and the softirq code manages how much time it can spend and how much budget it has.
Each CPU processing packets can spend the full budget. So, if budget is set to 300, and you have 2 CPUs, each CPU can process 300 packets each for a total of 600. This is only true if your NIC supports multiple RX queues and you've distributed the IRQs to separate CPUs for processing. If your NIC doesn't, you can use Receive Packet Steering to help with this (RPS). See my blog post above for more information.
No, there are no settings for this. Note that the softirqs run on the same CPU which raised them. So, if you set your hardirq handler for RX queue 1 to CPU 16, then the softirq will run on CPU 16. One thing you can do is: set your hardirqs to specific CPUs and set the application that will use that data to those same CPUs. Pin all other applications (like cron jobs, background tasks, etc) to other CPUs -- this ensures that only the hardirq handler, softirq, and application that will process the data can run.
If you desire extremely low latency network packet reads, you should try using a new Linux kernel networking feature called busy polling. It can be used to help minimize and reduce networking packet processing latency. Using this option will increase CPU usage.
Upvotes: 6