Reputation: 1031
__device__ static char Tc0[] = {'0','\0'};
__device__ static char Tc1000[] = {'1','0','0','0','\0'};
__device__ static char Tc1000th[] = {'1','0','0','0','t','h','\0'};
__device__ static char Tc100[] = {'1','0','0','\0'};
__device__ static char Tc100th[] = {'1','0','0','t','h','\0'};
20000+ similar lines next..
__device__ static char Tczymolytic[] = {'z','y','m','o','l','y','t','i','c','\0'};
__device__ static char Tczymotic[] = {'z','y','m','o','t','i','c','\0'};
int main()
{
}
Compile:
nvcc ./test2.cu
Besides lots of warning messages for unused variables, got following error:
ptxas error : File uses too much global constant data (0x29e58 bytes, 0x10000 max)
For what does CUDA use constant memory? Is it possible to fix it?
As @talonmies specified, with following compilation command it works:
nvcc -w -std=c++11 -arch=sm_52 -cubin ./test2.cu
Crucial option here is -arch=sm_52
Upvotes: 2
Views: 977
Reputation: 72352
In general. what you are doing is legal and should work.
However, it appears that on the now deprecated Fermi architecture (sm_20 and sm_21), the assembler will attempt to stuff the initializing values for statically defined and initialized device variables into constant memory, which has a 64kb size limit. On newer, supported architectures, this doesn't happen.
Because you are using CUDA 7.5, whose default compilation target is sm_20, if you don't specify an architecture where the assembler will emit static device declarations into global memory, the compilation will fail once the size of those symbols exceeds 64kb.
As an example:
$ cat make_silly.py
for i in range(0,100000):
print "__device__ static char tx%05d[] = {'0','1','2','3','5','6','7','8'};"%i
print ""
print "int main() { return 0; }"
$ python make_silly.py > make_silly.cu
$ tail -20 make_silly.cu
__device__ static char tx99982[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99983[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99984[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99985[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99986[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99987[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99988[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99989[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99990[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99991[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99992[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99993[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99994[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99995[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99996[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99997[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99998[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99999[] = {'0','1','2','3','5','6','7','8'};
int main() { return 0; }
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
$ nvcc -w -std=c++11 -arch=sm_30 -Xptxas="-v --disable-optimizer-constants" -cubin make_silly.cu
ptxas info : 800000 bytes gmem
$ nvcc -w -std=c++11 -arch=sm_20 -Xptxas="-v --disable-optimizer-constants" -cubin make_silly.cu
ptxas error : File uses too much global constant data (0xc3500 bytes, 0x10000 max)
ptxas info : 800000 bytes gmem, 800000 bytes cmem[14]
Here you can see that compilation only fails for a compute 2.x target. For the higher compute capability target, the assembler happily emits 800kb of static global memory symbols.
Upvotes: 4