Reputation: 43
I'm trying to troubleshoot an error I found while running the simpleP2P sample program, included in the CUDA samples. The error is as follows:
$ ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = " Tesla K20c" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Tesla K20c" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer-to-Peer (P2P) access from Tesla K20c (GPU0) -> Tesla K20c (GPU1) : No
> Peer-to-Peer (P2P) access from Tesla K20c (GPU1) -> Tesla K20c (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.
The devices I'm using are the following:
$ lspci | grep NVIDIA
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
Additional information concerning connectivity obtained from nvidia-smi:
$ nvidia-smi topo -m
GPU0 GPU1 CPU Affinity
GPU0 X SOC 0-5,12-17
GPU1 SOC X 6-11,18-23
Legend:
X = Self
SOC = Path traverses a socket-level link (e.g. QPI)
PHB = Path traverses a PCIe host bridge
PXB = Path traverses multiple PCIe internal switches
PIX = Path traverses a PCIe internal switch
Finally more verbose output from lspci tool.
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
Subsystem: NVIDIA Corporation Device 0982
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f9000000 (32-bit, non-prefetchable)
Memory at d0000000 (64-bit, prefetchable)
Memory at ce000000 (64-bit, prefetchable)
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidia_346, nouveau, nvidiafb
...
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
Subsystem: NVIDIA Corporation Device 0982
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at cc000000 (32-bit, non-prefetchable)
Memory at b0000000 (64-bit, prefetchable)
Memory at ae000000 (64-bit, prefetchable)
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidia_346, nouveau, nvidiafb
Any of you have some information that could help me to troubleshoot or at least better understand where is the problem? Thanks as usual for reading/helping. -- Omar
Upvotes: 2
Views: 2525
Reputation: 151982
When GPUs are interconnected via a socket-level link (QPI for an Intel-based system):
GPU0 X SOC 0-5,12-17
GPU1 SOC X 6-11,18-23
^^^
then P2P transactions are not possible between those 2 GPUs.
GPUs participating in P2P have a number of requirements placed on them. One of them is that they generally must be on the same PCIE root complex. GPUs that are connected via a socket-level link (e.g. QPI) are on two different "sockets" i.e. 2 different CPUs, and therefore they belong to two different PCIE root complexes.
Note that in general, P2P support may vary by GPU or GPU family. The ability to run P2P on one GPU type or GPU family does not necessarily indicate it will work on another GPU type or family, even in the same system/setup. The final determinant of GPU P2P support are the tools provided that query the runtime via cudaDeviceCanAccessPeer
. P2P support can vary by system and other factors as well. No statements made here are a guarantee of P2P support for any particular GPU in any particular setup.
Upvotes: 3