smihael
smihael

Reputation: 923

How to ensure that cuda kernels are run sequentially and no cpu calls are executed before both finish

Suppose we have the following cuda code:

    kernel1<<<blockGrid, threadBlock>>>(gpu_out, gpu_in, THREADS);
    cerr << "a: " << cudaGetErrorString(cudaGetLastError()) << endl;

    cudaDeviceSynchronize();

    kernel2<<<blockGrid, threadBlock>>>(gpu_out2, gpu_out, gpu_in);
    cerr << "b: " << cudaGetErrorString(cudaGetLastError()) << endl;

    cudaDeviceSynchronize();

    cout << "c " << endl;

I need gpu_out to be processed before continuing to the next kernel, and both kernels should do their work before executing the remaining cpu code.

Even though I included the cudaDeviceSynchronize() calls, the code does not run sequentially, since the output looks like this:

 a: no error
 c
 b: no error

Upvotes: 2

Views: 842

Answers (2)

Bahbar
Bahbar

Reputation: 18015

cerr and cout are buffered streams. That they don't flush to your console in any specific order is not related to the order of execution of calls writing to them. Try switching your output to cout to an output to cerr instead to see them ordered properly.

Upvotes: 2

Robert Crovella
Robert Crovella

Reputation: 152174

You're misinterpreting the output. Your code as written will execute sequentially.

Change all your stream I/O to use the same stream either cerr or cout, not both.

Upvotes: 2

Related Questions