Reputation: 30
I am a novice at OpenCL and recently I have stumbled onto something which does not make sense to me.
I am using Intel drivers (working on linux machine) and the device is Xeon Phi coprocessor.
The problem is that when I give local_item_size
as an argument to
clEnqueueNDRangeKernel(commandQueue,
forceKernel, 1,
&localItemSize, &globalItemSize,
NULL, 0, NULL, &kernelDone);
and when printing global thread id in the kernel
int tid = get_global_id(0);
The thread ids start from 1 and not from 0.
When I do not describe what my local_item_size and have NULL
as an argument it seems to start counting correctly from 0.
At the moment I am fixing this in my code by subtracting 1 from the return value of get_global_id(0)
for my code to work correctly..
Shortly: When I say what my local_item_size
is the tid starts from 1. When I give NULL it starts from 0.
Size setting code:
// Global item size
if (n <= NUM_THREADS) {
globalItemSize = NUM_THREADS;
localItemSize = 16;
} else if (n % NUM_THREADS != 0) {
globalItemSize = (n / NUM_THREADS + 1) * NUM_THREADS;
} else {
globalItemSize = n;
}
// Local item size
localItemSize = globalItemSize / NUM_THREADS;
Upvotes: 0
Views: 284
Reputation: 26868
The 4th parameter to clEnqueueNDRangeKernel
is an array of the offsets, not the local size - that's the 6th parameter. Your call should be
clEnqueueNDRangeKernel(commandQueue,
forceKernel, 1,
NULL, &globalItemSize,
&localItemSize, 0, NULL, &kernelDone);
This is also why the IDs started at 1 - because you requested an offset of 1!
Upvotes: 3
Reputation: 9925
You are passing your work-group size to the wrong argument. The third argument of clEnqueueNDRangeKernel
is the global work offset, which is why your global IDs are appearing offset. The work-group size should go to the sixth argument:
clEnqueueNDRangeKernel(commandQueue,
forceKernel, 1, NULL,
&globalItemSize, &localItemSize,
0, NULL, &kernelDone);
Upvotes: 3