Reputation: 3288
Suppose I have a cudaArray in GPU1 and another cudaArray in GPU2.
Calling cudaMemcpyArrayToArray with a cudaMemcpyDeviceToDevice flag actually results in GPU1 cudaArray copied to host memory and host memory copied to GPU2 cudaArray by looking at the profiler.
I tried copying the GPU1 cudaArray to a GPU1 global memory and call cudaMemcpyPeer to copy to a GPU2 global memory and then copy it to the GPU2 cudaArray. This is better than going through the host memory but there are still a lot of redundant copying.
Why isn't there a cudaMemcpyPeerArrayToArray? How do I copy cudaArray between two GPUs directly?
Upvotes: 1
Views: 515
Reputation: 72372
There is a peer to peer API for CUDA arrays.
Use either cudaMemcpy3DPeer or cudaMemcpy3DPeerAsync. This will use the most optimal device to device transfer path from within the peer to peer options available on your system.
Upvotes: 3