raktim bhatt
raktim bhatt

Reputation: 13

Peculiar behaviour with Mellanox ConnectX-5 and DPDK in rxonly mode

Recently I observed a peculiar behaviour with Mellanox ConnectX-5 100 Gbps NIC. While working on 100 Gbps rxonly using DPDK rxonly mode. It was observed that I was able to receive 142 Mpps using 12 queues. However with 11 queues, it was only 96 Mpps, with 10 queues 94 Mpps, 9 queues 92 Mpps. Can anyone explain why there is a sudden/abrupt jump in capture performance from 11 queues to 12 queues?

The details of the setup is mentioned below.

I have connected two servers back to back. One of them (server-1) is used for traffic generation and the other (server-2) is used for traffic reception. In both the servers I am using Mellanox ConnectX-5 NIC. Performance tuning parameters mentioned in section-3 of https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf [pg no.:11,12] has been followed

Both servers are of same configuration.

Server configuration

  1. Processor: Intel Xeon scalable processor, 6148 series, 20 Core HT, 2.4 GHz, 27.5 L3 Cache
  2. No. of Processor: 4 Nos.
  3. RAM: 256 GB, 2666 MHz speed

DPDK version used is dpdk-19.11 and OS is RHEL-8.0

For traffic generation testpmd with --forward=txonly and --txonly-multi-flow is used. Command used is below.

Packet generation testpmd command in server-1

./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow

testpmd> set txpkts 64

It was able to generate 64 bytes packet at the sustained rate of 142.2 Mpps. This is used as input to the second server that works in rxonly mode. The command for reception is mentioned below

Packet Reception command with 12 cores in server-2

./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa

testpmd> set fwd rxonly

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1363328297 RX-missed: 0          RX-bytes:  87253027549
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 19         TX-errors: 0          TX-bytes:  3493

  Throughput (since last show)
  Rx-pps:    142235725          Rx-bps:  20719963768
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

Packet Reception command with 11 cores in server-2

./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa

testpmd> set fwd rxonly

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1507398174 RX-missed: 112937160  RX-bytes:  96473484013
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 867061720  TX-errors: 0          TX-bytes:  55491950935

  Throughput (since last show)
  Rx-pps:     96718960          Rx-bps:  49520107600
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

If you see there is a sudden jump in Rx-pps from 11 cores to 12 cores. This variation was not observed elsewhere like 8 to 9, 9 to 10 or 10 to 11 and so on.

Can anyone explain the reason of this sudden jump in performance.

The same experiment was conducted, this time using 11 cores for traffic generation.

./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 0          RX-missed: 0          RX-bytes:  0
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 2473087484 TX-errors: 0          TX-bytes:  158277600384

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:    142227777          Tx-bps:  72820621904
  ############################################################################

On the capture side with 11 cores

./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa

testpmd> set fwd rxonly

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 8411445440 RX-missed: 9685       RX-bytes:  538332508206
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 0          TX-errors: 0          TX-bytes:  0

  Throughput (since last show)
  Rx-pps:     97597509          Rx-bps:    234643872
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

On the capture side with 12 cores

./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa

testpmd> set fwd rxonly

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 9370629638 RX-missed: 6124       RX-bytes:  554429504128
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 0          TX-errors: 0          TX-bytes:  0

  Throughput (since last show)
  Rx-pps:    140664658          Rx-bps:    123982640
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

The sudden jump in performance from 11 to 12 core still remains the same.

Upvotes: 0

Views: 1262

Answers (1)

Vipin Varghese
Vipin Varghese

Reputation: 4798

With DPDK LTS release for 19.11, 20.11, 21.11 running just in vector mode (default mode) for Mellanox CX-5 and CX-6 does not produce the problem mentioned above.

[EDIT-1] retested with rxqs_min_mprq=1 for 2 * 100Gbps for 64B, For 16 RXTX on 16T16C resulted in degradation 9~10Mpps. For all RX queue from 1 to 7 RX there is degration of 6Mpps with rxqs_min_mprq=1.

Following is the capture for RXTX to core scaling enter image description here

investigating into MPRQ claim, the following are some the unique observations

  1. For both MLX CX-5 and CX-6, the max that each RX queue can attain is around 36 to 38 MPPs
  2. Single core can achieve up to 90Mpps (64B) with 3 RXTX in IO using AMD EPYC MILAN on both CX-5 and CX-6.
  3. For 100Gbps on 64B can be achieved with 14 Logical cores (7 Physical cores) with testpmd in IO mode.
  4. for both CX-5 and CX-6 2 * 100Gbps for 64B requires MPRQ and compression technique to allow more packets in and out of system.
  5. There are multitude of configuration tuning required to achieve high number. Please refer stackoverflow question and DPDK MLX tuning parameters for more information.

enter image description here

PCIe gen4 BW is not the limiting factor, but the NIC ASIC with internal embedded siwtch results in above mentioned behaviour. hence to overcome these limitation one needs to use PMD arguments to activate the Hardware, which further increases the overhead on CPU in PMD processing. Thus there is barrier (needs more cpu) to process the compressed and multiple packets inlined to convert to DPDK single MBUF. This is reason why more therads are required when using PMD arguments.

enter image description here

note:

Test application: testpmd
EAL Args:  --in-memory  --no-telemetry --no-shconf --single-file-segments --file-prefix=2 -l 7,8-31 
PMD args vector: none
PMD args for 2 * 100Gbps line rate: txq_inline_mpw=204,txqs_min_inline=1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=12,rxq_pkt_pad_en=1,rxq_cqe_comp_en=4

Upvotes: 1

Related Questions