Serge Rogatch
Serge Rogatch

Reputation: 15020

__assume() seems to cause an internal error in NVCC: "Call has wrong number of parameters..."

I have the following code which causes an error in the context of a larger codebase:

struct Quadruple
{
    double _sum;
    double _corr;

    // Fast multiply by -1, 0 or +1 (doesn't need extended precision operations).
    __device__ inline Quadruple& MulSign(const int8_t sign);
};

// ...

__device__ inline Quadruple& Quadruple::MulSign(const int8_t sign)
{
    __assume(-1 <= sign && sign <= 1); // enable more optimization premises
    _sum *= sign;
    _corr *= sign;
    return *this;
}

// ...

int8_t y = /* ... read from memory ... */;
Quadruple a = /* ... computed here ... */;
Quadruple b = Quadruple(a).MulSign(y);

When compiling the code with CUDA 8RC + MSVC++2013 I get the following error:

1>  Compiling CUDA source file Example.gpu.cu...
1>  
1>  D:\ExDir>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_21,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile -cudart static     -D_DENSE_REP -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W4 /nologo /Ox /Zi  /MD " -o x64\Release\Example.gpu.cu.obj "D:\ExDir\Example.gpu.cu" 
1>ptxas C : /Users/ExUser/AppData/Local/Temp/tmpxft_00002ffc_00000000-4_CuSvm.gpu.ptx, line 2513; error : Call has wrong number of parameters
1>  ptxas fatal   : Ptx assembly aborted due to errors
1>  Example.gpu.cu
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 8.0.targets(599,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_21,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"     --keep-dir x64\Release -maxrregcount=0  --machine 64 --compile -cudart static     -D_DENSE_REP -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W4 /nologo /Ox /Zi  /MD " -o x64\Release\Example.gpu.cu.obj "D:\ExDir\Example.gpu.cu"" exited with code 255.

After commenting out __assume(-1 <= sign && sign <= 1);, the code compiles ok.

Is it a bug in NVCC, or is MSVC++'s __assume just not supported by NVCC, or am I doing something wrong? Is there a workaround (keeping __assume or an equivalent, of course)?

Upvotes: 2

Views: 189

Answers (2)

Johan
Johan

Reputation: 76537

Since CUDA 11.2 nvcc now has the __builtin_assume() intrinsic if gcc or clang is the host compiler. If the host compiler is cl.exe (aka MSVC), then -- additionally -- __assume can be used.
It will always work in __device__ code.

However when the __host__ compiler is gcc (i.e.: not MSVC and not clang) you cannot use __assume in host code, because gcc does not support it.

You can define it as follows as a workaround, see: https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#index-assume-statement-attribute

#ifndef __assume
#ifdef __GNUC__
#define __assume(cond) do { \
    __attribute__((assume(cond))); \
} while (0) 
#endif
#ifdef __CUDA_ARCH__
#define __assume(cond) do { __builtin_assume(cond); } while (0)
#endif
#endif

For very old versions of gcc I guess you can substitute the __attribute__ with if (!(cond)) __builtin_unreachable();, but it seems silly to pair a new version of nvcc with an old gcc.

See:
https://developer.nvidia.com/blog/boosting-productivity-and-performance-with-the-nvidia-cuda-11-2-c-compiler/

Note that all of MSVC, clang, gcc, and nvcc (11.2+) do support __builtin_assume_aligned.

Starting from nvcc 11.3 __builtin_unreachable is also supported.

Upvotes: 1

talonmies
talonmies

Reputation: 72349

__assume() is a Microsoft (and subsequently Intel icc) compiler intrinsic. It isn't part of the C++ language and it isn't supported in CUDA (or in GCC or Clang for that matter).

Upvotes: 4

Related Questions