Martin W
Martin W

Reputation: 111

thrust::binary_search fails at run time with execution policy specifying a user stream

thrust::binary_search segfaults for all but the default stream as far as I can tell. I can't find any information describing such a restriction in the documentation, so I'm hoping that an expert can enlighten me on proper usage.

Here is a simple example. This test code creates a vector of unsorted integers, copies to the device using a thrust vector. Then, it creates a stream and sorts using that stream. However if I attempt to specify an execution policy to a binary search routine on that stream, I get a seg fault. I need multiple streams to improve concurrency in a more complex case, of course.

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <iostream>
int main(void)
{
   std::vector<int> data = {31, 16, 14, 55, 61, 18, 33, 88, 72};
   thrust::host_vector<int> H(data);
   thrust::device_vector<int> D = H;

   cudaStream_t stream;
   cudaStreamCreate(&stream);

   thrust::sort(thrust::cuda::par.on(stream), D.begin(), D.end());
   // So far so good
   auto it1 = thrust::upper_bound(thrust::cuda::par, D.begin(), D.end(), 50);
   // Also good
   std::cout << "Test 1 = " << *it1 << std::endl;
   // But the next call seg faults
   auto it2 = thrust::upper_bound(thrust::cuda::par.on(stream), D.begin(), D.end(), 50);
   std::cout << "Test 2 = " << *it2 << std::endl; 
   cudaStreamDestroy(stream);
   return 0;
}

I'm using CUDA 9.1 on a compute capability 6.1 device.

Upper bound on the default stream works as expected. Upper bound on stream using the execution policy thrust::cuda::par.on(stream) seg faults. I can't find any wisdom about this in the documentation. Is this right? Is there a workaround?

Upvotes: 1

Views: 320

Answers (2)

Allison Piper
Allison Piper

Reputation: 81

Just to follow up -- this bug is fixed by https://github.com/thrust/thrust/pull/1104.

Upvotes: 1

blelbach
blelbach

Reputation: 486

I'm the maintainer of Thrust. This is an unfortunate bug from before my time due to an oversight in the new CUDA backend for Thrust introduced in CUDA 9.0. The TL;DR is that the new CUDA backend doesn't have specializations of any of the binary search algorithms, so the generic sequential fallback is used. For some reason, the generic fallback implementation explodes when a stream execution policy is passed through.

I'm looking into the cause of the second problem, but the bigger concern is the first problem (no implementation of binary search algorithms in the new backend). The fix won't make it into the next CUDA release, but it will hopefully be in the release after that. However, after the next CUDA release, the Thrust GitHub will be back in service, and I'll be able to deploy a fix through there.

Unfortunately at this time, I have no other workaround.

GitHub Issue 921 is tracking this bug.

Upvotes: 1

Related Questions