jack fang
jack fang

Reputation: 63

DPDK mlx5 driver encounters buffer flow problem in pkt receiving

problem summary

My application uses DPDK-stable-20.11.10. A Segmentation fault happened this morning when i tried to run my application as usual. Then I used gdb to trace the problem and found that the segmentation fault occured due to an illegal memory access in DPDK. The problem happed in function rxq_cq_decompress_v, defined in drivers\net\mlx5\mlx5_rxtx_vec_neon.h (This function is implemented differently across various architectures, my server is aarch64). Here is the problem code:

static inline uint16_t
rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
            struct rte_mbuf **elts)
{
    ......

    /*
     * A. load mCQEs into a 128bit register.
     * B. store rearm data to mbuf.
     * C. combine data from mCQEs with rx_descriptor_fields1.
     * D. store rx_descriptor_fields1.
     * E. store flow tag (rte_flow mark).
     */
    for (pos = 0; pos < mcqe_n; ) {
        uint8_t *p = (void *)&mcq[pos % 8];
        uint8_t *e0 = (void *)&elts[pos]->rearm_data;
        uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data;
        uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data;
        uint8_t *e3 = (void *)&elts[pos + 3]->rearm_data;
        uint16x4_t byte_cnt;
#ifdef MLX5_PMD_SOFT_COUNTERS
        uint16x4_t invalid_mask =
            vcreate_u16(mcqe_n - pos < MLX5_VPMD_DESCS_PER_LOOP ?
                    -1UL << ((mcqe_n - pos) *
                         sizeof(uint16_t) * 8) : 0);
#endif

        ......

The overflow happened in line uint8_t *e3 = (void *)&elts[pos + 3]->rearm_data;, when DPDK tried to visit elts[pos + 3].

In my subsequent attempts to reproduce the error, I discovered that it could occur inelts[pos + 3] or elts[pos + 2] or some other accesses to elts array around above lines.

ask for help

I read the related code and found tha elts seems to be a ring buffer. In rxq_burst_v function which calls rxq_cq_decompress_v, a mask e_mask was used to access this array:

static inline uint16_t
rxq_burst_v(struct mlx5_rxq_data *rxq, struct rte_mbuf **pkts,
        uint16_t pkts_n, uint64_t *err, bool *no_cq)
{
    ......

    const uint16_t e_n = 1 << rxq->elts_n;
    const uint16_t e_mask = e_n - 1;

    ......

    if (rcvd_pkt > 0) {
        ......
        rxq_copy_mbuf_v(&(*rxq->elts)[rxq->rq_pi & e_mask],
                pkts, rcvd_pkt);
        ......
    }
    elts_idx = rxq->rq_pi & e_mask;
    elts = &(*rxq->elts)[elts_idx];

I am sorry that i cannot provide any code pieces or picture of my application for some confidentiality requirements, but the prolem seems happen inside DPDK. I am confused that:

  1. What does the DPDK code listed above actually mean? Given that elts is a ring buffer, why isn’t there any check for array bounds, or why doesn't the code use some mask like e_mask?
  2. What steps can be taken to avoid this issue?

Appreciate for your help! :)

Upvotes: 1

Views: 49

Answers (0)

Related Questions