Jithin Pavithran
Jithin Pavithran

Reputation: 1351

Some questions about cuda streams

Code:

__global__ void K1() {
    int p=1;
    for(int i=0; i<100000; ++i)
        for(int j=0; j<100000; ++j)
            p*=(i+100)*j;
    printf("K1\n");
}
__global__ void K2() {
    printf("K2\n");
}

int main() {
    int *ptr;
    cudaStream_t s1, s2;
    cudaStreamCreate(&s1);
    cudaStreamCreate(&s2);
    K1<<<1, 1, 0, s1>>>();
    cudaHostAlloc(&ptr, 1, 0);
    K2<<<1, 1, 0, s2>>>();
    cudaDeviceSynchronize();
    return 0;
}

Output:

K2
K1

Questions:

  1. Is s1 same as default stream?
  2. As per this documentation,

Two commands from different streams cannot run concurrently if any one of the following operations is issued in-between them by the host thread:

  • a page-locked host memory allocation,

shouldn't K2 start after K1 finish?

Upvotes: 1

Views: 254

Answers (1)

Florent DUGUET
Florent DUGUET

Reputation: 2926

On the first point, s1 is not the default stream.

On the second point, running you code on a small system, I got the following profiler timeline.

enter image description here

My understanding is that there is a delay between the scheduling of a kernel launch and its actual launch. This is no surprise as launch are asynchronous with streams. As a result, the cudaHostAlloc occurs before any kernel launch. It occurs inbetween kernel call schedules, BUT before any kernel call.

Upvotes: 1

Related Questions