yaron
yaron

Reputation: 383

full cache linux cause drop at nic

I have a dpdk 19 application and read from nic(MT27800 Family [ConnectX-5] 100G) with 32 rx multiqueue with RSS .

So there are 32 processes that receive traffic from nic with dpdk, Each process read from a different Queue, copy from the mbuf the data to allocated memory, accumulate to 6MB and send it to another thread via a lockless Queue, that other thread only write the data to disk. As a result I/O write is cached in linux memory.

All processes run with cpu affinity, there is isolcpus in the grub

This a little pseudo code of what happen in each of the 32 processes that read from its Queue, i can't put the real code, it is too much

MainFunction()
{
   char * local_buf = new...
   int nBufs = rte_eth_rx_burst(pi_nPort, pi_nQNumber, m_mbufs, 216);
   for(mbuf in m_mbufs)
   { 
       memcpy(local_buf+offset, GetData(mbuf),len);//accumulate to buf
       if(local_buf.len > MAX)
       {
          PushToQueue(local_buf);
          local_buf = new ...
       }
       rte_pktmbuf_free(mbuf);
   }
}

WriterThreadMainFunc
{
     While(QueueNotEmpty)      
     {
          buf = PullFromQ
          WriteToDisk(buf)
          delete buf;
     }

}

When the server memory is completely cache ( I know it is still available) I start seeing drops at nic.

If I delete the data from disk every minute the cached memory is released to free and and no drops at nic. So the drops are clearly linked to the cached data. Until the first drops the application can receive run without drops for 2 hours. The process don't use much memory each process is at 500 MB.

How can I avoid the drops at nic?

               total        used        free      shared  buff/cache   available
Mem:           125G         77G        325M         29M         47G         47G
Swap:          8.0G        256K        8.0G

I use Centos 9.7 linux 3.10.0-1160.49.1.el7.x86_64.

Upvotes: 0

Views: 514

Answers (1)

Vipin Varghese
Vipin Varghese

Reputation: 4798

DPDK API rte_eth_rx_burst uses the mempool (or pktmbuf) memory region to hold the metadata and ethernet frame. In each rx_burst cycle internally

  1. it checks for local cached mempool object for pkt_mbuf to DMA from physical NIC
  2. If local cache mbuf are not found, it acquires the mempool lock and gets the mbuf from the mempool.
  3. All mbuf are marked with ref_cnt as 1, to indicate the mbuf is in use and not to free.
  4. Unless a tx_burst or rte_mbuf_free is invoked the mbuf is never pushed to local cache or mempool for reuse.

Hence as shared in the code snippet, the performance of WriterThreadMainFunc affects the availability of mempool. that is, if speed of rx_burst (Million Packet per Sec) is greater than

  1. function PullfromQueue
  2. or function WriteToDisk
  3. or both functions

this will leads to the scenario mbuf_free is slower than rx_burst. To validate the same, one can

  1. dpdk-prociinfo for stats and xstats, to check the counter rx_no_mbuf
  2. or integrate get_stats and get_xstats for the same counter.

Normally files when open (especially in RW mode) will be cached in 4k Pages or on transparent huge pages (expect for never setting) for performance. Based conversation in the comments it looks like, since caching is in effect DISK IO runs slower which leads to WriterThreadMainFunc to runs slower. To check this behaviour as suggested in comments please

  1. use echo 1 | sudo tee /proc/sys/vm/drop_caches.
  2. try using fflush & fsync periodically.
  3. or create ramdisk and open the file to read and write on ramdisk.

Once the problem is isolated you can use setbuf(f, NULL) to disable buffering at the start itself.

Note: there are mulltitude of other options like create per port-queue, per flow, per flow-port-queue with mmap for the current requirement too.

Upvotes: 0

Related Questions