jsc0218
jsc0218

Reputation: 482

CUDA overlap of data transfer and kernel execution, implicit synchronization for streams

  1. After reading CUDA's "overlap of data transfer and kernel execution" section in "CUDA C Programming Guide", I have a question: what exactly does data transfer refers to? Does it include cudaMemsetAsync, cudaMemcpyAsync, cudaMemset, cudaMemcpy. Of course, the memory allocated for memcpy is pinned.

  2. In the implicit synchronization (streams) section, the book says "a device memory set" may serialize the streams. So, does it refer to cudaMemsetAsync, cudaMemcpyAsync, cudaMemcpy, cudaMemcpy? I am not sure.

Upvotes: 2

Views: 1192

Answers (1)

Pavan Yalamanchili
Pavan Yalamanchili

Reputation: 12099

Any function call with an Async at the end has a stream parameter. Additionally, some of the libraries provided by the CUDA toolkit also have the option of setting a stream. By using this, you can have multiple streams running concurrently.

This means, unless you specifically create and set a stream, you will be using the defualt stream. For example, there are no default data transfer and kernel execution streams. You will have to create two streams (or more), and allocate them a task of choice.

A common use case is to have the two streams as mentioned in the programming guide. Keep in mind, this is only useful if you have multiple kernel launches. You can get the data needed for the next (independent) kernel or the next iteration of the current kernel while computing the results for the current kernel. This can maximize both compute and bandwidth capabilities.

For the function calls you mention, cudaMemcpy and cudaMemcpyAsync are the only functions performing data transfers. I don't think cudaMemset and cudaMemsetAsync can be termed as data transfers.

Both cudaMempyAsync and cudaMemsetAsync can be used with streams, while cudaMemset and cudaMemcpy are blocking calls that do not make use of streams.

Upvotes: 2

Related Questions