Tim McGrand
Tim McGrand

Reputation: 77

OpenCL - Insert values in every n elements of an array

I have an array of 100 elements, and what I want to do is copy these 100 elements into every nth element of another array.

Let's say n was 3

The new array would have [val1 0 0 val2 0 0 val3 0 0 ...] after the values were copied to every nth element. Now in opencl, I tried creating a pointer which would point to the current index and simply I would just add n to this value every time. However, the current index always just keeps the same value in it. Below is the code I have.

__kernel void ddc(__global float *inputArray, __global float *outputArray,  __const int interpolateFactor, __global int *currentIndex){
    int i = get_global_id(0);
    outputArray[currentIndex[0]] = inputArray[i];
    currentIndex[0] = currentIndex[0] + (interpolateFactor - 1);
    printf("index %i \n", currentIndex[0]);    
}

Host code for the currentIndex part:

int  *index;
index = (int*)malloc(2*sizeof(int));
index[0] = 0;

cl_mem currentIndex;
currentIndex = clCreateBuffer(
    context,
    CL_MEM_WRITE_ONLY,
    2 * sizeof(int),
    NULL,
    &status);
status = clEnqueueWriteBuffer(
    cmdQueue,
    currentIndex,
    CL_FALSE,
    0,
    2 * sizeof(int),
    index,
    0,
    NULL,
    NULL);
printf("Index enqueueWriteBuffer status: %i \n", status);
status |= clSetKernelArg(
    kernel,
    4,
    sizeof(cl_mem),
    &currentIndex);
printf("Kernel Arg currentIndex Factor status: %i \n", status);

If you are wondering why I am using an array with two elements, it's because I wasn't sure how to just reference a single variable. I just implemented it the same way I had the input and output array working. When I run the kernel with an interpolateFactor of 3, currentIndex is always printing 2.

Upvotes: 1

Views: 561

Answers (1)

Jovasa
Jovasa

Reputation: 445

So if I understood right what you want to do is save the next index that should be used to currentIndex. This will not work. The value will not instantly update for other workitems. If you wanted to do it this way you would have to execute all the kernels sequentially.

What you could do is

__kernel void ddc(__global float *inputArray, __global float *outputArray,  __const int interpolateFactor, int start){
    int i = get_global_id(0);
    outputArray[start+i*(interpolateFactor-1)] = inputArray[i];
}

assuming you can start from any other spot than 0. Otherwise you could just ditch it completely.

To get it working like that you do

int start = 0;
status |= clSetKernelArg(
    kernel,
    3, // This should be 3 right? Might have been the problem to begin with.
    sizeof(int),
    &start);

Hopefully this helps.

Upvotes: 2

Related Questions