Carmellose
Carmellose

Reputation: 5088

Open global_work_size misunderstanding

I'm trying to understand a simple OpenCL example, which is vector addition. The kernel is the following:

__kernel void addVec(__global double* a, __global double* b, __global double* c)
{
  size_t id = get_global_id(0);
  c[id] = a[id] + b[id];
}

For example, my input arrays have a size of 1 million elements each.

In my host program, I set global_work_size to be exactly the size of the vectors input arrays (1 million).

But when i set it to a smaller value, for example 1000, it also works with this kernel!

I don't understand why the global_work_size can be lesser than the problem dimension, and still, the OpenCL program compute every elements of the input arrays.

Could someone clarify on this?

EDIT: here is the code where I copy the data:

size_t arraySize = 1000000;
const size_t global_work_size[1] = {512};

double *host_a = malloc(arraySize*sizeof(double));
double *host_b = malloc(arraySize*sizeof(double));
double *host_c = calloc(arraySize, sizeof(double));

...

// Create the input and output arrays in device memory for our calculation
device_a = clCreateBuffer(context, CL_MEM_READ_ONLY, arraySize*sizeof(double), NULL, NULL);
device_b = clCreateBuffer(context, CL_MEM_READ_ONLY, arraySize*sizeof(double), NULL, NULL);
device_c = clCreateBuffer(context, CL_MEM_WRITE_ONLY, arraySize*sizeof(double), NULL, NULL);

...

// Copy data set into the input array in device memory. [host --> device]
status = clEnqueueWriteBuffer(command_queue, device_a, CL_TRUE, 0, arraySize*sizeof(double), host_a, 0, NULL, NULL);
status |= clEnqueueWriteBuffer(command_queue, device_b, CL_TRUE, 0, arraySize*sizeof(double), host_b, 0, NULL, NULL);

...

// Copy-back the results from the device [host <-- device]
clEnqueueReadBuffer(command_queue, device_c, CL_TRUE, 0, arraySize*sizeof(double), host_c, 0, NULL, NULL );

...
printf("checking result validity ...\n");
for (size_t i=0; i<arraySize; ++i)
  if(host_c[i] - 1 > 1e-6) // the array is supposed to be 1 everywhere
  {
    printf("*** ERROR! Invalid results ! host_c[%zi]=%.9lf\n", i, host_c[i]);
    break;
  }

Thanks

Upvotes: 0

Views: 99

Answers (1)

DarkZeros
DarkZeros

Reputation: 8410

Your test function doesn't look good, it will be met for any value < 1, it should be like this:

for (size_t i=0; i<arraySize; ++i){
  cl_double val = host_c[i] - 1; // the array is supposed to be 1 everywhere
  if((val > 1e-6) || (val < -1e-6)) 
  {
    printf("*** ERROR! Invalid results ! host_c[%zi]=%.9lf\n", i, host_c[i]);
    break;
  }
}

Non initialized values in the GPU are likely to be 0, therefore meeting your condition.

Additionally, remember that if you run the program once with the full size, consecutive reads will still hold the proper processed data (even if you close and open the app again). Since the GPU memory is not cleaned after the buffer is created/destroyed.

Upvotes: 1

Related Questions