P2P memory access fail while running multi-GPU CUDA sample (simpleP2P)

Question

I'm trying to troubleshoot an error I found while running the simpleP2P sample program, included in the CUDA samples. The error is as follows:

$ ./simpleP2P 
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "     Tesla K20c" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "     Tesla K20c" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer-to-Peer (P2P) access from Tesla K20c (GPU0) -> Tesla K20c (GPU1) : No
> Peer-to-Peer (P2P) access from Tesla K20c (GPU1) -> Tesla K20c (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.

The devices I'm using are the following:

$ lspci | grep NVIDIA
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)

Additional information concerning connectivity obtained from nvidia-smi:

$ nvidia-smi topo -m
    GPU0    GPU1    CPU Affinity
GPU0     X  SOC 0-5,12-17
GPU1    SOC  X  6-11,18-23

Legend:

  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch

Finally more verbose output from lspci tool.

03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
        Subsystem: NVIDIA Corporation Device 0982
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at f9000000 (32-bit, non-prefetchable)
        Memory at d0000000 (64-bit, prefetchable)
        Memory at ce000000 (64-bit, prefetchable)
        Capabilities: 
        Kernel driver in use: nvidia
        Kernel modules: nvidia_346, nouveau, nvidiafb
...
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
        Subsystem: NVIDIA Corporation Device 0982
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at cc000000 (32-bit, non-prefetchable)
        Memory at b0000000 (64-bit, prefetchable)
        Memory at ae000000 (64-bit, prefetchable)
        Capabilities: 
        Kernel driver in use: nvidia
        Kernel modules: nvidia_346, nouveau, nvidiafb

Any of you have some information that could help me to troubleshoot or at least better understand where is the problem? Thanks as usual for reading/helping. -- Omar

Robert Crovella · Accepted Answer

When GPUs are interconnected via a socket-level link (QPI for an Intel-based system):

GPU0     X  SOC 0-5,12-17
GPU1    SOC  X  6-11,18-23
        ^^^

then P2P transactions are not possible between those 2 GPUs.

GPUs participating in P2P have a number of requirements placed on them. One of them is that they generally must be on the same PCIE root complex. GPUs that are connected via a socket-level link (e.g. QPI) are on two different "sockets" i.e. 2 different CPUs, and therefore they belong to two different PCIE root complexes.

Note that in general, P2P support may vary by GPU or GPU family. The ability to run P2P on one GPU type or GPU family does not necessarily indicate it will work on another GPU type or family, even in the same system/setup. The final determinant of GPU P2P support are the tools provided that query the runtime via cudaDeviceCanAccessPeer. P2P support can vary by system and other factors as well. No statements made here are a guarantee of P2P support for any particular GPU in any particular setup.

P2P memory access fail while running multi-GPU CUDA sample (simpleP2P)

Answers (1)

Related Questions