Reputation: 321
I receive a compile time error within the instantiation of thrust's templates when attempting to do the following:
thrust::copy_if(deviceEntries.begin(), deviceEntries.end(), hostResultBuffer->begin(),
IsEntrySelected(rootLayer));
The definition for IsEntrySelected:
struct IsEntrySelected : thrust::unary_function<Entry, bool> {
inline IsEntrySelected(const unsigned long int layer):_layer(layer) {}
__device__ __host__
inline bool operator()(const Entry & val) const {
return val.selected && val.layer == _layer;
}
private:
unsigned long int _layer;
};
Is this operation possible? I have been able to use the same call by placing the result in an intermediate device_vector buffer on the GPU instead of copying directly to the host buffer, but would like to avoid doing so to save GPU memory. Is there another way of filtering and copying conditionally to the host in a manner that avoids an additional GPU buffer?
Upvotes: 0
Views: 935
Reputation: 151799
Is this operation possible?
No, it's not possible to use copy_if
this way.
There are no CUDA device->host copy operations that can copy an arbitrarily scattered array to a compacted array. Therefore the only way thrust could accomplish this on the CUDA backend would be to create an intermediate array on the device to do the compaction operation, followed by a cudaMemcpy
to effect the device->host transfer (which it doesn't do as you've discovered). So there wouldn't be any way around a temporary array, even if thrust would do it "automatically" for you (which it doesn't).
If space is at an absolute premium, then copy the array to the host intact and do the stream compaction there. But for performance reasons, I expect it would usually be better to do the stream compaction on the device and then transfer the (presumably smaller) array to the host.
Upvotes: 2