nkarstens
nkarstens

Reputation: 157

High performance packet handling in Linux

I’m working on a packet reshaping project in Linux using the BeagleBone Black. Basically, packets are received on one VLAN, modified, and then are sent out on a different VLAN. This process is bidirectional - the VLANs are not designated as being input-only or output-only. It’s similar to a network bridge, but packets are altered (sometimes fairly significantly) in-transit.

I’ve tried two different methods for accomplishing this:

  1. Creating a user space application that opens raw sockets on both interfaces. All packet processing (including bridging) is handled in the application.
  2. Setting up a software bridge (using the kernel bridge module) and adding a kernel module that installs a netfilter hook in post routing (NF_BR_POST_ROUTING). All packet processing is handled in the kernel.

The second option appears to be around 4 times faster than the first option. I’d like to understand more about why this is. I’ve tried brainstorming a bit and wondered if there is a substantial performance hit in rapidly switching between kernel and user space, or maybe something about the socket interface is inherently slow?

I think the user application is fairly optimized (for example, I’m using PACKET_MMAP), but it’s possible that it could be optimized further. I ran perf on the application and noticed that it was spending a good deal of time (35%) in v7_flush_kern_dcache_area, so perhaps this is a likely candidate. If there are any other suggestions on common ways to optimize packet processing I can give them a try.

Upvotes: 0

Views: 1363

Answers (2)

schorsch_76
schorsch_76

Reputation: 862

The performance of the user space application depends on the used syscall to monitor the sockets too. The fastest syscall is epoll() when you need to handle a lot of sockets. select() will perform very poor, if you handle a lot of sockets.

See this post explaining it: Why is epoll faster than select?

Upvotes: 0

b4hand
b4hand

Reputation: 9770

Context switches are expensive and kernel to user space switches imply a context switch. You can see this article for exact numbers, but the stated durations are all in the order of microseconds.

You can also use lmbench to benchmark the real cost of context switches on your particular cpu.

Upvotes: 1

Related Questions