Reputation: 31
I've written kernels using normal CUDA code which does not use thrust device vectors. The kernels outputs some results stored in an array on the device, say array X. I'd like to now do a reduction on X. Is there a way to use the thrust::reduction function without first copying X into a thrust::device_vector variable?
Upvotes: 2
Views: 403
Reputation: 72349
The traditional way to do this is to wrap the device pointer into a thrust::device_ptr
and pass that to thrust algorithms. The tag based template model in Thrust will ensure that a device execution results because of the type of the input sequence supplied in the call.
#include <thrust/device_ptr.h>
#include <thrust/reduce.h>
int* X;
cudaMalloc((void **)&X, sizeof(int) * size_t(N));
// Do stuff with X
thrust::device_ptr X_ptr(X);
int result = thrust::reduce(X_ptr, X_ptr+N);
From Thrust 1.7 onwards, the concept of execution policies have been introduced. This removes the need for explicit wrapping of the device address using device_ptr
. So instead you can use to thrust::device
policy to indicate that the input iterators are on the device and do something like
#include <thrust/reduce.h>
#include <thrust/execution_policy.h>
int* X;
cudaMalloc((void **)&X, sizeof(int) * size_t(N));
// Do stuff with X
int result = thrust::reduce(thrust::device, X, X+N);
Which way you choose to do this should be guided by what version of Thrust you have and which code style you prefer.
Upvotes: 4