Reputation: 12363
I have two host machines connected by Mellanox infiniband HCA. I'm executing a simple RDMA application to perform RDMA write and RDMA read operation from one machine (client) on the other machine (server). To know which interrupts are related to HCA cards on each machine, I ran the following command:
less proc/interrupts
67: 475880 50253 0 0 PCI-MSI-edge mlx4-async@pci:0000:01:00.0
68: 399002 0 73 0 PCI-MSI-edge mlx4_0-0
69: 0 3264 23 0 PCI-MSI-edge mlx4_0-1
70: 0 0 0 0 PCI-MSI-edge mlx4_0-2
71: 0 0 0 0 PCI-MSI-edge mlx4_0-3
On the server machine, I've experimented that using the function __disable_irq()
on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message "transport retry counter exceeded".
My question is why and when RDMA read/write operations can generate irqs on the remote machine? I thought that if they don't involve the remote CPU, then they will not perform any kind of IRQ.
Then, why disabling those interrupts causes these operations to fail?
Upvotes: 1
Views: 722
Reputation: 180070
Not all transactions are RDMA transactions.
Furthermore, when you're writing to another machine's memory, you need interrupts to notice when the write has finished (so that you know when you can reuse your own memory), and to notify the other machine that new data has shown up in its memory.
Upvotes: 1