Reputation: 1137
From my understanding, CUDA 10.1 removed the shfl
instructions:
PTX ISA version 6.4 removes the following features:
Support for
shfl
and vote instructions without the.sync
qualifier has been removed for .targetsm_70 and higher. This support was deprecated since PTX ISA version 6.0 as documented in PTX ISA version 6.2.
What is the correct way to support shfl
future and past CUDA versions?
My current methods (shared below) result in the error using CUDA 10.1:
ptxas ... line 466727; error : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
template <typename T>
__device__ static __forceinline__
T _shfl_up(T var, unsigned int delta, int width=WARPSIZE, unsigned mask=MEMBERMASK)
{
#if (__CUDACC_VER_MAJOR__ >= 9)
var = __shfl_up_sync(mask, var, delta, width);
#else
var = __shfl_up(var, delta, width);
#endif
return var;
}
Also, I would like to add that one of the dependencies of my project is CUB and I believe they utilize the same method to split up _sync()
and older shfl
instructions. I am not sure what I am doing wrong.
Upvotes: 1
Views: 1309
Reputation: 1137
I was doing the right thing, turns out another dependency didn't have support for sync
, created a pull request for it: https://github.com/moderngpu/moderngpu/pull/32
template <typename T>
__device__ static __forceinline__
T _shfl_up(T var, unsigned int delta, int width=WARPSIZE, unsigned mask=MEMBERMASK)
{
#if ( __CUDA_ARCH__ >= 300)
#if (__CUDACC_VER_MAJOR__ >= 9)
var = __shfl_up_sync(mask, var, delta, width);
#else
var = __shfl_up(var, delta, width);
#endif
#endif
return var;
}
Upvotes: 3