CUDA: Differences between HtoD and DtoH bandwidth

Question

Yet another bandwidth related question. I expected the plots of Device-to-host bandwidth and that of Host-to-Device to be similar, but I see that there is a significant difference between the two. Considering both following the same route, so the effective bandwidth should be the same, isn't it? The testbed consists of total 12 Intel Westmere CPUs on two sockets, 4 Tesla C2050 GPUs with 4 PCIe Gen2 Express slots. Using the bandwidthtest program from NVidia code samples. enter image description here

What are the overheads of doing a cudamemCpy from the host vs the device?

harrism · Accepted Answer

First, I would say those two curves are similar. I can honestly say that I've never seen symmetric PCI-e bandwidth on any system I have used -- and that includes both CUDA and graphics (OpenGL/D3D) tests, so I don't think it's something (especially this small difference) that should concern you.

As with your other PCI-e bandwidth question, the answer is similar -- the driver may use different strategies for different types and sizes of transfers, attempting to get the highest throughput possible.

Actual throughput depends on many factors, including the type of GPU, and especially on the host chipset in use.

CUDA: Differences between HtoD and DtoH bandwidth

Answers (1)

Related Questions