Reputation: 1043
I have four jobs (grouped in two tasks) need to be executed on AMD OpenCl device and GPU Device parallely. As per my knowledge, calling NDRangeKernel for AMD OpenCl CPU Device, is returning promptly (non blocking) if NULL event is passed.
TASK1 Hence, firstly i am calling NDRangeKernel for AMD OpenCl CPU Device with NDRangeKernel for job1, after which host will have the control promptly.
ret = clEnqueueNDRangeKernel(command_queue_amd, icm_kernel_amd, 1, NULL, &glob, &local, 0, NULL, NULL);
TASK2 Then host can call NDRangeKernel for GPU Device using gpu kernel 1 for job2 and then for gpu kernel 2 for job3 and then gpu kernel 3 for job4 which will call them serially.
ret = clEnqueueNDRangeKernel(command_queue_gpu, icm_kernel_gpu[0], 1, NULL, &glob, &local, 0, NULL, NULL);
ret = clEnqueueNDRangeKernel(command_queue_gpu, icm_kernel_gpu[1], 1, NULL, &glob, &local, 0, NULL, NULL);
ret = clEnqueueNDRangeKernel(command_queue_gpu, icm_kernel_gpu[2], 1, NULL, &glob, &local, 0, NULL, NULL);
They are not returning promptly to host.
And then reading buffer for GPU and then for CPU.
ret = clEnqueueReadBuffer(command_queue_gpu, Buffer_gpu, CL_TRUE, 0, count * sizeof(double), arr_gpu, 0, NULL, NULL);
ret = clEnqueueReadBuffer(command_queue_amd, Buffer_amd, CL_TRUE, 0, count * sizeof(double), arr_cpu, 0, NULL, NULL);
My question is, is both the tasks are running parallely? Is there any profiler/logic to detect such behaviour? Any commments/logics/pointers will be appreciated.
Upvotes: 1
Views: 1069
Reputation: 8410
Let me write a proper answer:
The parallel execution of the kernels depends on the device/queue model used. In the general "spec" point of view:
But from the HW point of view: (nVIDIA, AMD, etc)
In a multi-device setting, this constrain is relaxed, and the kernels can run in parallel in different devices. But in order to be able to run fully parallel there are some rules to meet:
In order to measure if "in fact" parallel execution is working I recommend to use events.
You can do it the hard way (manually) or you can use CodeXL, nSight or Intel SDK. They will collect this metrics for you by hooking the OpenCL calls, and give you the insight you need (in a very convenient format with figures and statistics).
Upvotes: 4
Reputation: 1814
Though, comment about Command Queue was made, there is something to add.
You can use AMD CodeXL tool to collect application timeline & see if tasks are done in parallel. Or another very simple solution - look at CPU load level in your OS Task Manager and simultaneously do the same for GPU in Catalyst Center. If load levels are increasing at the same time - tasks are done in parallel.
Upvotes: 1