Matthias
Matthias

Reputation: 516

How would you implement this function in CUDA? (offsets in sorted integer vector)

I have a sorted integer array on the device, e.g.:

[0,0,0,1,1,2,2]

And I want the offsets to each element in another array:

[0,3,5]

(since the first 0 is at position 0, the first 1 at position 3 and so on) I know how many different elements there will be beforehand. How would you implement this efficiently in CUDA? I'm not asking for code, but a high level description of the algorithm you would implement to compute this transformation. I already hat a look at the various functions in the thrust name space, but could not think of any combination of thrust functions to achieve this. Also, does this transformation have a widely accepted name?

Upvotes: 3

Views: 1189

Answers (4)

Jared Hoberock
Jared Hoberock

Reputation: 11416

You can solve this in Thrust using thrust::unique_by_key_copy with thrust::counting_iterator. The idea is to treat your integer array as the keys argument to unique_by_key_copy and to use a sequence of ascending integers (i.e., counting_iterator) as the values. unique_by_key_copy will compact the values array into the indices of each unique key:

#include <thrust/device_vector.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/unique.h>
#include <thrust/copy.h>
#include <iterator>
#include <iostream>

int main()
{
  thrust::device_vector<int> keys(7);
  keys[0] = 0; keys[1] = 0; keys[2] = 0;
  keys[3] = 1; keys[4] = 1; keys[5] = 2; keys[6] = 2;

  std::cout << "keys before unique_by_key_copy: [ ";
  thrust::copy(keys.begin(), keys.end(), std::ostream_iterator<int>(std::cout," "));
  std::cout << "]" << std::endl;

  thrust::device_vector<int> offsets(3);

  thrust::unique_by_key_copy(keys.begin(), keys.end(),          // keys
                             thrust::make_counting_iterator(0), // [0, 1, 2, 3, ...] are the values
                             thrust::make_discard_iterator(),   // discard the compacted keys
                             offsets.begin());                  // the offsets are the values

  std::cout << "offsets after unique_by_key_copy: [ ";
  thrust::copy(offsets.begin(), offsets.end(), std::ostream_iterator<int>(std::cout," "));
  std::cout << "]" << std::endl;

  return 0;
}

Here's the output:

$ nvcc test.cu -run
keys before unique_by_key_copy: [ 0 0 0 1 1 2 2 ]
offsets after unique_by_key_copy: [ 0 3 5 ]

Upvotes: 4

ArchaeaSoftware
ArchaeaSoftware

Reputation: 4422

Scan is the algorithm you're looking for. If you don't have an implementation lying around, the Thrust library would be a good resource. (Look for thrust::scan)

Scan (or "parallel prefix sum") takes an input array and generates an output where each element is the sum of the inputs to that point: [1 5 3 7] => [1 6 9 16]

If you scan predicates (0 or 1 depending on an evaluated condition) where the predicate checks whether a given element the same as the preceding element, then you compute the output index of the element in question. Your example array

[0 0 0 1 1 2 2] [0 0 0 1 0 1 0] <= predicates [0 0 0 1 1 2 2] <= scanned predicates

Now you can use the scanned predicates as indices to write your output.

Upvotes: 1

pQB
pQB

Reputation: 3127

Although I've never used thrust library, what about this possible approach (simple but maybe effective):

int input[N];  // your sorted array
int offset[N]; // the offset of the first values of each elements. Initialized with -1

// each thread will check an index position
if (input[id] > input[id-1]) // bingo! here begins a new value
{
    int oid = input[id];  // use the integer value as index
    offset[oid] = id;     // mark the offset with the beginning of the new value
}

In your example the output will be:

[0,3,5]

But if the input array is:

[0,0,0,2,2,4,4]

Then the output will be:

[0,-1, 3, -1, 5]

Now, if thrust can do it for you, remove_if( offset[i] == -1 ) and compact the array.

This approach will waste lot of memory for the offset array, but as you dont know how many offset you are going to find, the worst case will use as much memory as the input array.

On the other hand, the few instruction per thread compared to the global memory load will limit this implementation by memory bandwidth. There are some optimization for this case as process some values per thread.

My 2 cents!

Upvotes: 4

Wisdom&#39;s Wind
Wisdom&#39;s Wind

Reputation: 1448

Good question and the answer depends on what you need to do with it after. Let me explain.

As soon as this problem can be solved in O(n) (where n is the input length) on CPU, you will suffer from memory allocation and copying (Host -> Device (input) and Device -> Host (result)) drawbacks. This will leads to performance degradation against simple CPU solution.

Even if your array already in device memory, each computation block need to read it to local or registers (at least access device memory), and it can't be done significantly faster than on CPU.

In general CUDA accelerate perfomance well if:

  1. Asymptotic complexity of computations is high comparing to input data length. For example input data length is n and complexity is O(n^2) or O(n^3).

  2. There is way to split task to independed or weak depended subtasks.

So if I was you, I would not try to do computations of such kind on CUDA if it's possible. And if it must be some standalone function or output format convertion for some other function I would do in CPU.

If it's part of some more complex algorithm the answer is more complicated. If I was on your place I would try to somehow change [0,3,5] format, because it adds limitations for utilization CUDA computation power. You can't effectively split your task on independed blocks. Just for example if I process 10 integers in one computation thread and next 10 integers in other. The second one don't know where to place his outputs until first one not finished. May be I will split an array on subarrays and store answer for each subarray separately. It's highly depends on what computations you are doing.

Upvotes: 0

Related Questions