Fopa Léon Constantin
Fopa Léon Constantin

Reputation: 12363

Why disabling IRQ on linux causes rdma_read and rdma_write to fail?

I have two host machines connected by Mellanox infiniband HCA. I'm executing a simple RDMA application to perform RDMA write and RDMA read operation from one machine (client) on the other machine (server). To know which interrupts are related to HCA cards on each machine, I ran the following command:

  less proc/interrupts

  67:   475880  50253       0       0   PCI-MSI-edge    mlx4-async@pci:0000:01:00.0
  68:   399002      0       73      0   PCI-MSI-edge    mlx4_0-0
  69:       0   3264        23      0   PCI-MSI-edge    mlx4_0-1
  70:       0       0       0       0   PCI-MSI-edge    mlx4_0-2
  71:       0       0       0       0   PCI-MSI-edge    mlx4_0-3

On the server machine, I've experimented that using the function __disable_irq() on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message "transport retry counter exceeded".

My question is why and when RDMA read/write operations can generate irqs on the remote machine? I thought that if they don't involve the remote CPU, then they will not perform any kind of IRQ.

Then, why disabling those interrupts causes these operations to fail?

Upvotes: 1

Views: 722

Answers (1)

CL.
CL.

Reputation: 180070

Not all transactions are RDMA transactions.

Furthermore, when you're writing to another machine's memory, you need interrupts to notice when the write has finished (so that you know when you can reuse your own memory), and to notify the other machine that new data has shown up in its memory.

Upvotes: 1

Related Questions