Reputation: 923
Suppose we have the following cuda code:
kernel1<<<blockGrid, threadBlock>>>(gpu_out, gpu_in, THREADS);
cerr << "a: " << cudaGetErrorString(cudaGetLastError()) << endl;
cudaDeviceSynchronize();
kernel2<<<blockGrid, threadBlock>>>(gpu_out2, gpu_out, gpu_in);
cerr << "b: " << cudaGetErrorString(cudaGetLastError()) << endl;
cudaDeviceSynchronize();
cout << "c " << endl;
I need gpu_out to be processed before continuing to the next kernel, and both kernels should do their work before executing the remaining cpu code.
Even though I included the cudaDeviceSynchronize()
calls, the code does not run sequentially, since the output looks like this:
a: no error
c
b: no error
Upvotes: 2
Views: 842
Reputation: 18015
cerr and cout are buffered streams. That they don't flush to your console in any specific order is not related to the order of execution of calls writing to them. Try switching your output to cout to an output to cerr instead to see them ordered properly.
Upvotes: 2
Reputation: 152174
You're misinterpreting the output. Your code as written will execute sequentially.
Change all your stream I/O to use the same stream either cerr
or cout
, not both.
Upvotes: 2