Reputation: 1271
I am running into an issue where when trying to use cudaMallocManaged() and thrust in the same CUDA 6 application thrust fails, even when thrust is not using any of the managed memory. Simply having an unused managed variable is enough to cause thrust to fail. I have created the following reproducer which I am testing on an NVIDIA Jetson TK1 running CUDA 6.0:
#include "thrust/device_ptr.h"
#include "thrust/sort.h"
__global__ void calculate_hash(uint *hash_values, uint *particle_ids, int length)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i >= length)
return;
hash_values[i] = 1;
particle_ids[i] = i;
}
void hash_particles_gpu(uint *d_hash_values, uint *d_particle_ids, int length)
{
int block_size = 256;
int num_blocks = ceil(length/(float)block_size);
calculate_hash<<<num_blocks, block_size>>>(d_hash_values, d_particle_ids, length);
cudaDeviceSynchronize();
thrust::device_ptr<uint> keys(d_hash_values);
thrust::device_ptr<uint> values(d_particle_ids);
thrust::sort_by_key(keys, keys+length, values);
}
int main(int argc, char *argv[])
{
int length = 15;
int bytes;
#ifdef BROKE
int *m_int;
cudaMallocManaged((void**)&m_int, sizeof(int));
#endif
// Allocate uint hash value array
bytes = length*sizeof(unsigned int);
unsigned int * hash_values;
cudaMalloc((void**)&hash_values, bytes);
// Allocate uint particle ID array
bytes = length*sizeof(unsigned int);
unsigned int *particle_ids;
cudaMalloc((void**)&particle_ids, bytes);
hash_particles_gpu(hash_values, particle_ids, length);
}
When I compile and run:
$ nvcc -DBROKE -DTHRUST_DEBUG example.cu -o broke.exe
$ nvcc -DTHRUST_DEBUG example.cu -o fixed.exe
$ ./fixed.exe
$ ./broke.exe
terminate called after throwing an instance of 'thrust::system::system_error'
what(): synchronize: RakingReduction: unknown error
Abort
I have checked to make sure I do not have any errors up until this point and everything seems fine until I call sort_by_key. Any idea what is going on?
Upvotes: 1
Views: 516
Reputation: 1271
Thanks for the comments. I flashed the latest Linux for Tegra, 19.3, and it works now with Cuda 6.0. Looks like NVIDIA had a driver issue with L4T 19.2.
Upvotes: 1