Marcus Wichelmann
Marcus Wichelmann

Reputation: 812

Low throughput with XDP_TX in comparison with XDP_DROP/REDIRECT

I have developed a XDP program that filters packets based on some specific rules and then either drops them (XDP_DROP) or redirects them (xdp_redirect_map) to another interface. This program was well able to process a synthetic load of ~11Mpps (that's all what my traffic generator is capable of) on just four CPU cores.

Now I've changed that program to use XDP_TX to send the packets out on the interface they were received on instead of redirecting them to another interface. Unfortunately, this simple change caused a big drop in throughput and now it hardly handles ~4Mpps.

I don't understand, what could be the cause for this or how to debug this further, that's why I'm asking here.

My minimal test setup to reproduce the issue:

When running the program with XDP_DROP, 4 cores on Machine 2 are slightly loaded with ksoftirqd threads while dropping around ~11Mps. That only 4 cores are loaded makes sense, given that pktgen sends out 4 different packets that fill only 4 rx queues becaue of how the hashing in the NIC works.

But when running the program with XDP_TX, one of the cores is a ~100% busy with ksoftirqd and only ~4Mpps are processed. Here I'm not sure, why that happens.

Do you have an idea, what might be causing this throughput drop and CPU usage increase?

Edit: Here some more details about the configuration of Machine 2:

# ethtool -g ens2f0
Ring parameters for ens2f0:
Pre-set maximums:
RX:             4096
RX Mini:        n/a
RX Jumbo:       n/a
TX:             4096
Current hardware settings:
RX:             512   # changing rx/tx to 4096 didn't help
RX Mini:        n/a
RX Jumbo:       n/a
TX:             512

# ethtool -l ens2f0
Channel parameters for ens2f0:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          1
Combined:       63
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          1
Combined:       32

# ethtool -x ens2f0
RX flow hash indirection table for ens2f0 with 32 RX ring(s):
    0:      0     1     2     3     4     5     6     7
    8:      8     9    10    11    12    13    14    15
   16:      0     1     2     3     4     5     6     7
   24:      8     9    10    11    12    13    14    15
   32:      0     1     2     3     4     5     6     7
   40:      8     9    10    11    12    13    14    15
   48:      0     1     2     3     4     5     6     7
   56:      8     9    10    11    12    13    14    15
   64:      0     1     2     3     4     5     6     7
   72:      8     9    10    11    12    13    14    15
   80:      0     1     2     3     4     5     6     7
   88:      8     9    10    11    12    13    14    15
   96:      0     1     2     3     4     5     6     7
  104:      8     9    10    11    12    13    14    15
  112:      0     1     2     3     4     5     6     7
  120:      8     9    10    11    12    13    14    15
RSS hash key:
d7:81:b1:8c:68:05:a9:eb:f4:24:86:f6:28:14:7e:f5:49:4e:29:ce:c7:2e:47:a0:08:f1:e9:31:b3:e5:45:a6:c1:30:52:37:e9:98:2d:c1
RSS hash function:
    toeplitz: on
    xor: off
    crc32: off

# uname -a
Linux test-2 5.8.0-44-generic #50-Ubuntu SMP Tue Feb 9 06:29:41 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Edit 2: I've also tried MoonGen as a packet generator now and flooded Machine 2 with 10Mpps and 100 different packet variations (flows). Now the traffic is way better distributed between the cores when dropping all these packets with minimal CPU load. But XDP_TX can still not keep up and loads a single core to a 100% while processing ~3Mpps.

Upvotes: 3

Views: 1102

Answers (1)

Marcus Wichelmann
Marcus Wichelmann

Reputation: 812

I've now upgraded the kernel of Machine 2 to 5.12.0-rc3 and the issue disappeared. Looks like this was a kernel issue.

If somebody knows more about this or has a changelog regarding this, please let me know.

Upvotes: 1

Related Questions