How to ensure that cuda kernels are run sequentially and no cpu calls are executed before both finish

Question

Suppose we have the following cuda code:

    kernel1<<>>(gpu_out, gpu_in, THREADS);
    cerr << "a: " << cudaGetErrorString(cudaGetLastError()) << endl;

    cudaDeviceSynchronize();

    kernel2<<>>(gpu_out2, gpu_out, gpu_in);
    cerr << "b: " << cudaGetErrorString(cudaGetLastError()) << endl;

    cudaDeviceSynchronize();

    cout << "c " << endl;

I need gpu_out to be processed before continuing to the next kernel, and both kernels should do their work before executing the remaining cpu code.

Even though I included the cudaDeviceSynchronize() calls, the code does not run sequentially, since the output looks like this:

 a: no error
 c
 b: no error

Robert Crovella · Accepted Answer

You're misinterpreting the output. Your code as written will execute sequentially.

Change all your stream I/O to use the same stream either cerr or cout, not both.

How to ensure that cuda kernels are run sequentially and no cpu calls are executed before both finish

Answers (2)

Related Questions