Reputation: 68
I'm developing an accelerated component in OpenCL, using Xcode 4.5.1 and Grand Central Dispatch, guided by this tutorial.
The full kernel kept failing on the GPU, giving signal SIGABRT. I couldn't make much progress interpreting the error beyond that.
But I broke out aspects of the kernel to test, and I found something very peculiar involving assigning certain values to positions in an array within a loop.
Test scenario: give each thread a fixed range of array indices to initialize.
kernel void zero(size_t num_buckets, size_t positions_per_bucket, global int* array) {
size_t bucket_index = get_global_id(0);
if (bucket_index >= num_buckets) return;
for (size_t i = 0; i < positions_per_bucket; i++)
array[bucket_index * positions_per_bucket + i] = 0;
}
The above kernel fails. However, when I assign 1 instead of 0, the kernel succeeds (and my host code prints out the array of 1's). Based on a handful of tests on various integer values, I've only had problems with 0 and -1.
I've tried to outsmart the compiler with 1-1, (int) 0, etc, with no success. Passing zero in as a kernel argument worked though.
The assignment to zero does work outside of the context of a for loop:
array[bucket_index * positions_per_bucket] = 0;
The findings above were confirmed on two machines with different configurations. (OSX 10.7 + GeForce, OSX 10.8 + Radeon.) Furthermore, the kernel had no trouble when running on CL_DEVICE_TYPE_CPU -- it's just on the GPU.
Clearly, something ridiculous is happening, and it must be on my end, because "zero" can't be broken. Hopefully it's something simple. Thank you for your help.
Host code:
#include <stdio.h>
#include <OpenCL/OpenCL.h>
#include "zero.cl.h"
int main(int argc, const char* argv[]) {
dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
size_t num_buckets = 64;
size_t positions_per_bucket = 4;
cl_int* h_array = malloc(sizeof(cl_int) * num_buckets * positions_per_bucket);
cl_int* d_array = gcl_malloc(sizeof(cl_int) * num_buckets * positions_per_bucket, NULL, CL_MEM_WRITE_ONLY);
dispatch_sync(queue, ^{
cl_ndrange range = { 1, { 0 }, { num_buckets }, { 0 } };
zero_kernel(&range, num_buckets, positions_per_bucket, d_array);
gcl_memcpy(h_array, d_array, sizeof(cl_int) * num_buckets * positions_per_bucket);
});
for (size_t i = 0; i < num_buckets * positions_per_bucket; i++)
printf("%d ", h_array[i]);
printf("\n");
}
Upvotes: 0
Views: 502
Reputation: 3381
Refer to the OpenCL standard, section 6, paragraph 8 "Restrictions", bullet point k (emphasis mine):
6.8 k. Arguments to kernel functions in a program cannot be declared with the built-in scalar types bool, half, size_t, ptrdiff_t, intptr_t, and uintptr_t. [...]
The fact that your compiler even let you build the kernel at all indicates it is somewhat broken.
So you might want to fix that... but if that doesn't fix it, then it looks like a compiler bug, plain and simple (of CLC, that is, the OpenCL compiler, not your host code). There is no reason this kernel should work with any constant other than 0, -1. Did you try updating your OpenCL driver, what about trying on a different operating system (though I suppose this code is OS X only)?
Upvotes: 1