CUDA populate small array with contents of larger array

Question

I'm fairly new to CUDA programming, so please forgive me if this is a silly question.

In CUDA, I'm trying to populate a small device array B (~20000 int elements) with the contents of a large device array A (~20 million int elements). A contains mostly zeros, but has about ~20000 non-zero elements, located at random and unknown positions in the array. I'd like to fill B with the non-zero contents of A using CUDA. The order of the elements within B is not important.

I've looked at the SDK and I found a number of "reduce" strategies for, e.g., parallel summing of an array, but each of these approaches reduce the array to a scalar, whereas I'm trying to "reduce" an array to a smaller array. Searching online hasn't yielded anything either. I'm not looking for full code, but just some ideas/links of how to implement this. I'm using C, and if possible, I'd like to do this without using any C++ classes or structures.

Thank you in advance for your assistance.

Robert Crovella · Accepted Answer

what you're describing sometimes goes by the name stream compaction

Thrust (e.g. copy_if) and

cub (e.g. DeviceSelect) offer options that should have relatively good performance.

If you did want to implement it yourself, stream compaction may use a sequence of lower-level parallel operations, a key one being a prefix sum. You can get an idea of the build-up of a simple parallel prefix sum (and stream compaction) in GPU Gems. I'm just adding this for informational purposes; I'm not suggesting you implement either stream compaction or a prefix sum yourself.

A complete stream compaction example using the GPU Gems prefix sum method is here. However, I strongly encourage anyone to use thrust or cub, instead.

CUDA populate small array with contents of larger array

Answers (1)

Related Questions