Receiving RAW socket packets with microseconds level accuracy

Question

I am writing a code, which receives raw ethernet packets (no TCP/UDP) every 1ms from the server. For every packet received, my application has to reply with 14 raw packets. If the server doesn't receive the 14 packets before it sends it's packet scheduled for every 1ms, then the server raises an alarm and the application has to break out. The server-client communication is a one to one link.

The server is a hardware (FPGA) which generates packets at precise 1ms interval. The client application runs on a Linux (RHEL/Centos 7) machine with 10G SolarFlare NIC.

My first version of code is like this

while(1)
{
  while(1)
  {
     numbytes = recvfrom(sockfd, buf, sizeof(buf), 0, NULL, NULL);
     if(numbytes > 0)
     {
        //Some more lines here, to read packet number
        break;
     }
  }
  for (i=0;i<14;i++)
  {
     if (sendto(sockfd,(void *)(sym) , sizeof(sym), 0, NULL, NULL) < 0)
            perror("Send failed
");
  }
}

I measure the receive time by taking timestamps (using clock_gettime) before the recvfrom call and one after it, I print the time differences of these timestamps and print them whenever the time difference exceeds allowable range of 900-1100 us.

The problem I am facing is that the packet receive time is fluctuating.Something like this (the prints are in microseconds)

Decode Time : 1234
Decode Time : 762
Decode Time : 1593
Decode Time : 406
Decode Time : 1703
Decode Time : 257
Decode Time : 1493
Decode Time : 514
and so on..

And sometimes the decode times exceed 2000us and application would break.

In this situation, application would break anywhere between 2 seconds to a few minutes.

Options tried by me till now.

Setting affinity to a particular isolated core.
Setting scheduling priorities to maximum with SCHED_FIFO
Increase socket buffer sizes
Setting network interface interrupt affinity to the same core which processes application
Spinning over recvfrom using poll(),select() calls.

All these options give a significant improvement over initial version of code. Now the application would run for ~1-2 hours. But this is still not enough.

A few observations:

I get a a huge dump of these decode time prints, whenever I take ssh sessions to Linux machine while the application is running (which makes me think network communication over other 1G Ethernet interface is creating interference with the 10G Ethernet interface).
The application performs better in RHEL (run times of about 2-3 hours) than Centos (run times of about 30 mins - 1.5 hours)
The run times is also varying with Linux machines with different hardware configurations with same OS.

Please suggest if there are any other methods to improve the run-time of the application.

Thanks in advance.

Receiving RAW socket packets with microseconds level accuracy

Answers (1)

Related Questions