Reputation: 1043
Does the following code, invokes all the 4 kernel in parallel and all the 4 events will wait for them to complete?
event1 = event2 = event3 = event4 = 0;
printf("sending enqueue task..\n");
clEnqueueTask(command_queue, calculate1, 0, NULL, &event1);
clEnqueueTask(command_queue, calculate2, 0, NULL, &event2);
clEnqueueTask(command_queue, calculate3, 0, NULL, &event3);
clEnqueueTask(command_queue, calculate4, 0, NULL, &event4);
printf("waiting after enquing task..\n");
clWaitForEvents(1, &event1);
clWaitForEvents(1, &event2);
clWaitForEvents(1, &event3);
clWaitForEvents(1, &event4);
Or, is it the right way to perform the task for invoking all the kernels parallel? Is it even possible? What device info i need to see for confirming the same?
Upvotes: 0
Views: 684
Reputation: 9925
These tasks might execute in parallel, if you are using an out-of-order command queue and the device has support for executing multiple kernels in parallel. Unfortunately, there isn't any device info query that you can perform to verify whether the device has this capability, so you'll have to check the start/end times of the resulting events if you want to check whether this actually happened. An alternative method of achieving parallel kernel execution is by using multiple command queues (as discussed in the comments). Note that this kind of coarse-grained task parallelism won't execute particularly efficiently on massively parallel architectures such as GPU.
Rather than waiting for each event individually, you could just call clFinish(command_queue)
to wait for all commands to complete. You might also want to experiment with calling clFlush(command_queue)
immediately after enqueuing all the tasks to ensure that they are all submitted to the device.
Upvotes: 5