Reputation: 409
In the manual of CUDA, in the explaination of cudaStreamSynchronize(stream)
, it mentioned that
Blocks until stream has completed all operations. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the stream is finished with all of its tasks.
My question is this barrier blocks the host (i.e. all the devices in multigpu) to all the previously issued operations within the stream finish. Am I right?
And what about cudaDeviceSynchronize()
in multi-gpu task? It blocks all the devices to finish all the tasks issued on a device set by cudaSetDevice(deviceid)
or it blocks host to all the operations previously issued in all the devices finish?
Upvotes: 0
Views: 1125
Reputation: 409
I found the answer of my questions and I mention here for the one who might face the same problem. I quote it from programming guide of cuda
cudaDeviceSynchronize()
waits until all preceding commands in all streams of all host threads have completed.
cudaStreamSynchronize()
takes a stream as a parameter and waits until all preceding commands in the given stream have completed. It can be used to synchronize the host with a specific stream, allowing other streams to continue executing on the device.
cudaStreamWaitEvent()
takes a stream and an event as parameters (see Events for a description of events)and makes all the commands added to the given stream after the call to cudaStreamWaitEvent()
delay their execution until the given event has completed.
cudaStreamQuery()
provides applications with a way to know if all preceding commands in a stream have completed.
Upvotes: 1