Alex
Alex

Reputation: 13116

What is the main difference between RSS, RPS and RFS?

As known, there are: https://www.kernel.org/doc/Documentation/networking/scaling.txt

Does it meant that:

Is that correct?

Upvotes: 4

Views: 12576

Answers (2)

Tgilgul
Tgilgul

Reputation: 1744

osgx's answer covers the main differences, but it is important to point out that it is also possible to use RSS and RPS in unison.

RSS controls the selected HW queue for receiving a stream of packets. Once certain conditions are met, an interrupt would be issued to the SW. The interrupt handler, which is defined by the NIC's driver, would be the SW starting point for processing received packets. The code there would poll the packets from the relevant receive queue, might perform initial processing and then move the packets for higher level protocol processing.

At this point RPS mechanism might be used, if configured. The driver calls netif_receive_skb(), which (eventually) will check for RPS configuration. If exists it would enqueue the SKB for continuing processing on the selected CPU:

int netif_receive_skb(struct sk_buff *skb)
{
        ...
        return netif_receive_skb_internal(skb);
}

static int netif_receive_skb_internal(struct sk_buff *skb)
{
        ...
                int cpu = get_rps_cpu(skb->dev, skb, &rflow);

                if (cpu >= 0) {
                        ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
                        rcu_read_unlock();
                        return ret;
                }
        ...
}

In some scenarios, it would be smart to use RSS and RPS together in order to avoid CPU utilization bottlenecks on the receiving side. A good example is IPoIB (IP over Infiniband). Without diving into too many details, IPoIB has a mode which can only open a single channel. This means all the incoming traffic would be handled by a single core. By properly configuring RPS, some of the processing load can be shared by multiple cores, which dramatically improves performance for this scenario.

Since transmitting was mentioned, it worth noting that packet transmission, which results from the receiving process (ACKs, forwarding), would be processed from the same core selected by netif_receive_skb().

Hope this helps.

Upvotes: 5

osgx
osgx

Reputation: 94245

Quotes are from https://www.kernel.org/doc/Documentation/networking/scaling.txt.

  • RSS: Receive Side Scaling - is hardware implemented and hashes some bytes of packets ("hash function over the network and/or transport layer headers-- for example, a 4-tuple hash over IP addresses and TCP ports of a packet"). Implementations are different, some may not filter most useful bytes or may be limited in other ways. This filtering and queue distribution is fast (only several additional cycles are needed in hw to classify packet), but not portable between some network cards or can't be used with tunneled packets or some rare protocols. And sometimes your hardware have no support of number of queues enough to get one queue per logical CPU core.

RSS should be enabled when latency is a concern or whenever receive interrupt processing forms a bottleneck. Spreading load between CPUs decreases queue length.

  • Receive Packet Steering (RPS) "is logically a software implementation of RSS. Being in software, it is necessarily called later in the datapath.". So, this is software alternative to hardware RSS (still parses some bytes to hash them into queue id), when you use hardware without RSS or want to classify based on more complex rule than hw can or have protocol which can't be parsed in HW RSS classifier. But with RPS more CPU resources are used and there is additional inter-CPU traffic.

RPS has some advantages over RSS: 1) it can be used with any NIC, 2) software filters can easily be added to hash over new protocols, 3) it does not increase hardware device interrupt rate (although it does introduce inter-processor interrupts (IPIs)).

  • RFS: Receive Flow Steering is like RSS (software mechanism with more CPU overhead), but it not just hashing into pseudo-random queue id, but takes "into account application locality." (so, packet processing will probably be faster due to good locality). Queues are tracked to be more local to the thread which will process received data, and packets are delivered to correct CPU core.

The goal of RFS is to increase datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running. RFS relies on the same RPS mechanisms to enqueue packets onto the backlog of another CPU and to wake up that CPU. ... In RFS, packets are not forwarded directly by the value of their hash, but the hash is used as index into a flow lookup table. This table maps flows to the CPUs where those flows are being processed.

  • Accelerated RFS - RFS with hw support. (Check your network driver for ndo_rx_flow_steer) "Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load balancing mechanism that uses soft state to steer flows based on where the application thread consuming the packets of each flow is running.".

Similar method for packet transmitting (but packet is already generated and ready to be send, just select best queue to send it with - and to easier post-processing like freeing skb)

  • XPS: Transmit Packet Steering: "a mapping from CPU to hardware queue(s) is recorded. The goal of this mapping is usually to assign queues exclusively to a subset of CPUs, where the transmit completions for these queues are processed on a CPU within this set"

Upvotes: 11

Related Questions