cudaMemcpy() calls to streams

Question

Consider the two snippets of code.

Snippet1

cudaStream_t stream1, stream2 ;
cudaStreamCreate(&stream1);
cudaStreamCreate(&stream2);
cudaMemcpyAsync( dst, src, size, dir, stream1 );
kernel<<>>(...);



Snippet2
cudaStreamCreate(&stream1);
cudaStreamCreate(&stream2);
cudaMemcpy( dst, src, size, dir, stream1 );
kernel<<>>(...);

In both snippets I am issuing a memcpy call (snippet1 asynchronous and snippet2 synchronous)

Since the commands have been issued to two different streams, from my understanding there can be potential overlap in both cases.

But in Snippet2 cudaMemcpy call being synchronous (aka blocking) leads to me a paradoxical conclusion that cudaMemcpy and kernel call will be executed one after another.

Which one is the correct conclusion ?

To rephrase more compactly: When we issue the cudaMemcpy call to a stream does it block the "entire code" or just block the stream it was issued to?

ArchaeaSoftware · Accepted Answer

Synchronous calls do not return control to the CPU until the operation has been completed, so your second snippet will not even begin to submit the kernel launch until after the memcpy is done.

Your cudaMemcpy() call looks incorrect; I don't think you can specify stream parameters to any variant of memcpy that does not end in "Async." As written, the compiler might accept the code and take the stream as the memcpy direction.

cudaMemcpy() calls to streams

Answers (2)

Related Questions