landau
landau

Reputation: 5851

Force CUDA's thrust::reduce to execute with no parallelism

I have a CUDA program that uses thrust::reduce to parallelize sums: for example,

thrust::device_ptr<double> tmp(aux);
double my_sum = thrust::reduce(tmp, tmp + G);

where double* aux points to G contiguous doubles on the device. I need to compare the runtime of the whole parallelized program to a version with no parallel computation. Is there a way to run thrust::reduce using only a single thread on the device? A global switch would be the most convenient option.

Upvotes: 2

Views: 147

Answers (1)

talonmies
talonmies

Reputation: 72372

You should be able to do this by invoking thrust::reduce within a kernel using the serial execution policy and then launching that kernel with a single thread. Something like:

__global__ void serial_reduce(double *result, double *aux, int G)
{
    *result = thrust::reduce(thrust::seq, aux, aux+G);
}

double *result;
cudaMallocManaged(&result, sizeof(double));
serial_reduce<<<1,1>>>(result, aux, G);
cudaDeviceSynchronize();

[Note written in browser and totally untested, use at own risk]

Upvotes: 6

Related Questions