Reputation: 14016
I'm very new to the whole OpenCL world so I'm following some beginners tutorials. I'm trying to combine this and this to compare the the time required to add two arrays together on different devices. However I'm getting confusing results. Considering that the code is too long I made this GitHub Gist.
On my mac I have 1 platform with 3 devices. When I assign the j
in
cl_command_queue command_queue = clCreateCommandQueue(context, device_id[j], 0, &ret);
manually to 0 it seems to run the calculation on CPU (about 5.75 seconds). when putting 1 and 2 then calculation time drops drastically (0.01076 seconds). Which I assume is because the calculation is being ran on my Intel or AMD GPU. But Then there are some issues:
j
to any higher numbers and it still seems to run on GPUs.0<j
are suspiciously close. I wonder if they are really being ran on different devices.I have clearly no clue about OpenCL so I would appreciate if you could take a look at my code and let me know what are my mistake(s) and how I can solve it/them. Or maybe point me towards a good example which runs a calculation on different devices and compares the time.
P.S. I have also posted this question here in Reddit
Upvotes: 1
Views: 213
Reputation: 769
Before submitting a question for an issue you are having, always remember to check for errors (specifically, in this case, that every API call returns CL_SUCCESS
). The results are meaningless otherwise.
In the specific case, the problem in your code is that when getting the device IDs, you're only getting one device ID (line 60, third argument), meaning that everything else in the buffer is bogus, and results for j > 0
are meaningless.
The only surprising thing is that it doesn't crash.
Also, when checking runtimes, use OpenCL events, not host-side clock times. In your case you're at least doing after the clFinish
, so you are ensuring that the kernel execution terminates, but you're essentially counting the time necessary for all the setup, rather than just the copy time.
Upvotes: 1