Dropped packets even if rte_eth_rx_burst do not return a full burst

Question

I have a weird drop problem, and to understand my question the best way is to have a look at this simple snippet:

while( 1 )
{
    if( config->running == false ) {
        break;
    }
    num_of_pkt = rte_eth_rx_burst( config->port_id,
                                   config->queue_idx,
                                   buffers,
                                   MAX_BURST_DEQ_SIZE);
    if( unlikely( num_of_pkt == MAX_BURST_DEQ_SIZE ) ) {
        rx_ring_full = true; //probably not the best name
    }

    if( likely( num_of_pkt > 0 ) )
    {
        pk_captured += num_of_pkt;

        num_of_enq_pkt = rte_ring_sp_enqueue_bulk(config->incoming_pkts_ring,
                                               (void*)buffers,
                                               num_of_pkt,
                                               &rx_ring_free_space);
        //if num_of_enq_pkt == 0 free the mbufs..
     }
}

This loop is retrieving packets from the device and pushing them into a queue for further processing by another lcore.

When I run a test with a Mellanox card sending 20M (20878300) packets at 2.5M p/s the loop seems to miss some packets and pk_captured is always like 19M or similar.

rx_ring_full is never true, which means that num_of_pkt is always < MAX_BURST_DEQ_SIZE, so according to the documentation I shall not have drops at HW level. Also, num_of_enq_pkt is never 0 which means that all the packets are enqueued.

Now, if from that snipped I remove the rte_ring_sp_enqueue_bulk call (and make sure to release all the mbufs) then pk_captured is always exactly equal to the amount of packets I've send to the NIC.

So it seems (but I cant deal with this idea) that rte_ring_sp_enqueue_bulk is somehow too slow and between one call to rte_eth_rx_burst and another some packets are dropped due to full ring on the NIC, but, why num_of_pkt (from rte_eth_rx_burst) is always smaller than MAX_BURST_DEQ_SIZE (much smaller) as if there was always sufficient room for the packets?

Note, MAX_BURST_DEQ_SIZE is 512.

edit 1:

Perhaps this information might help: the drops seems to be visible also by rte_eth_stats_get, or to be more correct, no drops are reported (imissed and ierrors are 0) but the value of ipackets equals my counter pk_captured (are the missing packets just disappeared??)

edit 2:

According to ethtools rx_crc_errors_phy is zero and all the packets are received at PHY level (rx_packets_phy is updated with the correct amount of transferred packets).

The value from rx_nombuf from rte_eth_stats seems to contain trash (this is a print from our test application):

OUT(4): Port 1 stats: ipkt:19439285,opkt:0,ierr:0,oerr:0,imiss:0, rxnobuf:2061021195718

For a transfer of 20M packets, as you can see rxnobuf is garbage OR it has a meaning which I do not understand. The log is generated by:

  log("Port %"PRIu8" stats: ipkt:%"PRIu64",opkt:%"PRIu64",ierr:%"PRIu64",oerr:%"PRIu64",imiss:%"PRIu64", rxnobuf:%"PRIu64,
        config->port_id,
        stats.ipackets, stats.opackets,
        stats.ierrors, stats.oerrors,
        stats.imissed, stats.rx_nombuf);

where stats came from rte_eth_stats_get.

The packets are not generated on the fly but replayed from an existing PCAP.

edit 3

After the answer from Adriy (Thanks!) I've included the xstats output for the Mellanox card, while reproducing the same problem with a smaller set of packets I can see that rx_mbuf_allocation_errors get's updated, but it seems to contain trash:

OUT(4): rx_good_packets = 8094164
OUT(4): tx_good_packets = 0
OUT(4): rx_good_bytes = 4211543077
OUT(4): tx_good_bytes = 0
OUT(4): rx_missed_errors = 0
OUT(4): rx_errors = 0
OUT(4): tx_errors = 0
OUT(4): rx_mbuf_allocation_errors = 146536495542

Also those counters seems interesting:

OUT(4): tx_errors_phy = 0
OUT(4): rx_out_of_buffer = 257156
OUT(4): tx_packets_phy = 9373
OUT(4): rx_packets_phy = 8351320

Where rx_packets_phy is the exact amount of packets I've been sending, and summing up rx_out_of_buffer with rx_good_packets I get that exact amount. So it seems that the mbufs get depleted and some packets are dropped.

I made a tweak in the original code and now I'm making a copy of the mbuf from the RX ring using link and them releasing immediately the memory, further processing is done on the copy by another lcore. This do not fix the problem sadly, it turns out that to solve the problem I've to disable the packet processing and release also the packet copy (on the other lcore), which make no sense.

Well, will do a bit more investigation, but at least rx_mbuf_allocation_errors seems to need a fix here.

Andriy Berestovskyy · Accepted Answer

I guess, debugging rx_nombuf counter is a way to go. It might look like a garbage, but in fact this counter does not reflect the number of dropped packets (like ierrors or imissed do), but rather number of failed RX attempts.

Here is a snippet from MLX5 PMD:

uint16_t
mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
{
    [...]
    while (pkts_n) {
        [...]
        rep = rte_mbuf_raw_alloc(rxq->mp);
        if (unlikely(rep == NULL)) {
            ++rxq->stats.rx_nombuf;
            if (!pkt) {
                /*
                 * no buffers before we even started,
                 * bail out silently.
                 */
                break;

So, the plausible scenario for the issue is as follow:

There is a packet in RX queue.
There are no buffers in the corresponding mempool.
The application polls for new packets, i.e. calls in a loop: num_of_pkt = rte_eth_rx_burst(...)
Each time we call rte_eth_rx_burst(), the rx_nombuf counter gets increased.

Please also have a look at rte_eth_xstats_get(). For MLX5 PMD there is a hardware rx_out_of_buffer counter, which might confirm this theory.

Dropped packets even if rte_eth_rx_burst do not return a full burst

Answers (2)

Related Questions