Reputation: 41
Here is the problem: There number of arrays, for example, 2000 arrays, but only 256 integers in each array. And the range of the integers is quite considerable, [0, 1000000] for instance.
I want to get the unique elements for each array, in other words, remove the duplicate elements. I have 2 solutions:
Use Thrust to get the unique element for every array, so I have to do 2000 times thrust::unique
. But each array is pretty small, this way may not get a good performance.
Implement hash table in cuda kernel, use 2000 blocks, 256 thread in each block. And make use of the shared memory to implement to hash table, then every single block will produce a element-unique array.
The above two methods seem unprofessional, are there elegant ways to solve the problem by CUDA ?
Upvotes: 0
Views: 2658
Reputation: 16354
You can use thrust::unique
if you modify your data similar like it is done in this SO question: Segmented Sort with CUDPP/Thrust
For simplification, let's assume each array contains per_array
elements and there is a total of array_num
arrays. Each element is in the range [0,max_element]
.
Demo data
with per_array=4
, array_num=3
and max_element=2
could look like this:
data = {1,0,1,2},{2,2,0,0},{0,0,0,0}
To denote the membership of each element to the respective array we use the following flags
:
flags = {0,0,0,0},{1 1 1 1},{2,2,2,2}
In order to get unique elements per array of the segmented dataset we need to do the following steps:
Transform data
so the elements of each array i
are within the unique range [i*2*max_element,i*2*max_element+max_element]
data = data + flags*2*max_element
data = {1,0,1,2},{6,6,4,4},{8,8,8,8}
Sort the transformed data:
data = {0,0,1,2},{4,4,6,6},{8,8,8,8}
Apply thrust::unique_by_key
using data
as keys and flags
as values:
data = {0,1,2}{4,6}{8}
flags = {0,0,0}{1,1}{2}
Transform data
back to the original values:
data = data - flags*2*max_element
data = {0,1,2}{0,2}{0}
The maximum value of max_element
is bounded by the size of the integer used for representing data
. If it is an unsigned integer with n
bits:
max_max_element(n,array_num) = 2^n/(2*(array_num-1)+1)
Given your array_num=2000
, you will get the following limits for 32bit and 64bit unsigned integers:
max_max_element(32,2000) = 1074010
max_max_element(64,2000) = 4612839228234447
The following code implements the above steps:
unique_per_array.cu
#include <thrust/device_vector.h>
#include <thrust/extrema.h>
#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/functional.h>
#include <thrust/sort.h>
#include <thrust/unique.h>
#include <thrust/copy.h>
#include <iostream>
#include <cstdint>
#define PRINTER(name) print(#name, (name))
template <template <typename...> class V, typename T, typename ...Args>
void print(const char* name, const V<T,Args...> & v)
{
std::cout << name << ":\t";
thrust::copy(v.begin(), v.end(), std::ostream_iterator<T>(std::cout, "\t"));
std::cout << std::endl;
}
int main()
{
typedef uint32_t Integer;
const std::size_t per_array = 4;
const std::size_t array_num = 3;
const std::size_t total_count = array_num * per_array;
Integer demo_data[] = {1,0,1,2,2,2,0,0,0,0,0,0};
thrust::device_vector<Integer> data(demo_data, demo_data+total_count);
PRINTER(data);
// if max_element is known for your problem,
// you don't need the following operation
Integer max_element = *(thrust::max_element(data.begin(), data.end()));
std::cout << "max_element=" << max_element << std::endl;
using namespace thrust::placeholders;
// create the flags
// could be a smaller integer type as well
thrust::device_vector<uint32_t> flags(total_count);
thrust::counting_iterator<uint32_t> flags_cit(0);
thrust::transform(flags_cit,
flags_cit + total_count,
flags.begin(),
_1 / per_array);
PRINTER(flags);
// 1. transform data into unique ranges
thrust::transform(data.begin(),
data.end(),
thrust::counting_iterator<Integer>(0),
data.begin(),
_1 + (_2/per_array)*2*max_element);
PRINTER(data);
// 2. sort the transformed data
thrust::sort(data.begin(), data.end());
PRINTER(data);
// 3. eliminate duplicates per array
auto new_end = thrust::unique_by_key(data.begin(),
data.end(),
flags.begin());
uint32_t new_size = new_end.first - data.begin();
data.resize(new_size);
flags.resize(new_size);
PRINTER(data);
PRINTER(flags);
// 4. transform data back
thrust::transform(data.begin(),
data.end(),
flags.begin(),
data.begin(),
_1 - _2*2*max_element);
PRINTER(data);
}
Compiling and running yields:
$ nvcc -std=c++11 unique_per_array.cu -o unique_per_array && ./unique_per_array
data: 1 0 1 2 2 2 0 0 0 0 0 0
max_element=2
flags: 0 0 0 0 1 1 1 1 2 2 2 2
data: 1 0 1 2 6 6 4 4 8 8 8 8
data: 0 1 1 2 4 4 6 6 8 8 8 8
data: 0 1 2 4 6 8
flags: 0 0 0 1 1 2
data: 0 1 2 0 2 0
One more thing:
In the thrust development version there is an improvement implemented for thrust::unique*
which improves performance by around 25 %. You might want to try this version if you aim for better performance.
Upvotes: 4