Reputation: 111
thrust::binary_search
segfaults for all but the default stream as far as I can tell. I can't find any information describing such a restriction in the documentation, so I'm hoping that an expert can enlighten me on proper usage.
Here is a simple example. This test code creates a vector of unsorted integers, copies to the device using a thrust vector. Then, it creates a stream and sorts using that stream. However if I attempt to specify an execution policy to a binary search routine on that stream, I get a seg fault. I need multiple streams to improve concurrency in a more complex case, of course.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <iostream>
int main(void)
{
std::vector<int> data = {31, 16, 14, 55, 61, 18, 33, 88, 72};
thrust::host_vector<int> H(data);
thrust::device_vector<int> D = H;
cudaStream_t stream;
cudaStreamCreate(&stream);
thrust::sort(thrust::cuda::par.on(stream), D.begin(), D.end());
// So far so good
auto it1 = thrust::upper_bound(thrust::cuda::par, D.begin(), D.end(), 50);
// Also good
std::cout << "Test 1 = " << *it1 << std::endl;
// But the next call seg faults
auto it2 = thrust::upper_bound(thrust::cuda::par.on(stream), D.begin(), D.end(), 50);
std::cout << "Test 2 = " << *it2 << std::endl;
cudaStreamDestroy(stream);
return 0;
}
I'm using CUDA 9.1 on a compute capability 6.1 device.
Upper bound on the default stream works as expected. Upper bound on stream
using the execution policy thrust::cuda::par.on(stream)
seg faults. I can't find any wisdom about this in the documentation. Is this right? Is there a workaround?
Upvotes: 1
Views: 320
Reputation: 81
Just to follow up -- this bug is fixed by https://github.com/thrust/thrust/pull/1104.
Upvotes: 1
Reputation: 486
I'm the maintainer of Thrust. This is an unfortunate bug from before my time due to an oversight in the new CUDA backend for Thrust introduced in CUDA 9.0. The TL;DR is that the new CUDA backend doesn't have specializations of any of the binary search algorithms, so the generic sequential fallback is used. For some reason, the generic fallback implementation explodes when a stream execution policy is passed through.
I'm looking into the cause of the second problem, but the bigger concern is the first problem (no implementation of binary search algorithms in the new backend). The fix won't make it into the next CUDA release, but it will hopefully be in the release after that. However, after the next CUDA release, the Thrust GitHub will be back in service, and I'll be able to deploy a fix through there.
Unfortunately at this time, I have no other workaround.
GitHub Issue 921 is tracking this bug.
Upvotes: 1