Reputation: 482
After reading CUDA's "overlap of data transfer and kernel execution" section in "CUDA C Programming Guide", I have a question: what exactly does data transfer refers to? Does it include cudaMemsetAsync
, cudaMemcpyAsync
, cudaMemset
, cudaMemcpy
. Of course, the memory allocated for memcpy is pinned.
In the implicit synchronization (streams) section, the book says "a device memory set" may serialize the streams. So, does it refer to cudaMemsetAsync
, cudaMemcpyAsync
, cudaMemcpy
, cudaMemcpy
? I am not sure.
Upvotes: 2
Views: 1192
Reputation: 12099
Any function call with an Async
at the end has a stream parameter. Additionally, some of the libraries provided by the CUDA toolkit also have the option of setting a stream. By using this, you can have multiple streams running concurrently.
This means, unless you specifically create and set a stream, you will be using the defualt stream. For example, there are no default data transfer
and kernel execution
streams. You will have to create two streams (or more), and allocate them a task of choice.
A common use case is to have the two streams as mentioned in the programming guide. Keep in mind, this is only useful if you have multiple kernel launches. You can get the data needed for the next (independent) kernel or the next iteration of the current kernel while computing the results for the current kernel. This can maximize both compute and bandwidth capabilities.
For the function calls you mention, cudaMemcpy
and cudaMemcpyAsync
are the only functions performing data transfers. I don't think cudaMemset
and cudaMemsetAsync
can be termed as data transfers.
Both cudaMempyAsync
and cudaMemsetAsync
can be used with streams, while cudaMemset
and cudaMemcpy
are blocking calls that do not make use of streams.
Upvotes: 2