CUDA: how to move array elements

Question

I need to move each of the first k elements of a 1-D array by an offset, wherethe offsets are monotonically increasing, i.e., if the offset for element i is offset1 then element i+1 has offset, offset2, that satisfies: offset2 >= offset1.

I wrote a kernel that is executed on each of the first k elements:

if (thread_id < k) {

  // compute offset

  if (offset) {
    int temp = a[thread_id];

    __synchthreads();

    a[thread_id + offset] = temp;
  }
}

However, when tested for k = 3, the offset are indeed monotonically increasing, namely 0, 1, 1. Element 0 stays in its position as expected. However, element 1 gets copied to not only element 2 (according to the offset for element 1), but also to element 3.

That is, it appears that thread 2 reads element 2 and stores it into its copy of temp only after thread 1 has completed the copy of element 1 to element 2.

What am I doing wrong and how to fix it?

Thank you!

CUDA: how to move array elements

Answers (1)

Related Questions