Reputation: 1314
I'm declaring a global variable myvar
on the device using the __device__
specifier. I don't set it to a meaningful value anywhere (not using cudaMemcpyToSymbol in my kernel launch method, as you would normally do).
I'd expect the value of myvar
to be random garbage, but it's neatly 0.0 every time. Does CUDA do auto-initialisation of device variables?
I've checked it using the CUDA debugger also, the value is effectively 0.
__device__ float myvar;
__global__ void kernel(){
printf("my var: %f", myvar);
}
int kernel_launch(){
kernel<<<1,5>>>();
cudaDeviceSynchronize();
return 0;
}
Upvotes: 0
Views: 2458
Reputation: 5188
__device__
uninitialized variables, much like their global __host__
counterpart, need to be declared in the executable by their size and location in memory. As far as I know, such declarations always need a placeholder value, which unsurprisingly appears to be zero.
This can be checked readily. For example this command disassembles the output of a simple __device__ int a;
declaration:
nvcc -o test.o -c -x cu - <<< "__device__ int a;" && cuobjdump -xelf all test.o && nvdisasm *cubin
You'll get the following output:
.headerflags @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
//--------------------- .nv.constant14 --------------------------
.section .nv.constant14,"a",@progbits
.align 4
.align 8
.nv.constant14:
/*0000*/ .dword a
//--------------------- .nv.global --------------------------
.section .nv.global,"aw",@nobits
.align 4
.type a,@object
.size a,(.L_1 - a)
a:
.nv.global:
.zero 4
.L_1:
where you can clearly see the implicit zero initialization.
However, I believe it would be unsafe to rely on this.
Upvotes: 0
Reputation: 15734
CUDA does not automatically initialize any variables. It's just a CUDA implementation based coincidence that myvar
becomes zero in your test app.
In IEEE-754 floating point (used by NVIDIA GPUs), an all zero pattern corresponds to 0.0, so it's a much more likely "random" value than, say, 1.0f.
Don't infer the values of all your GPU memory based on the value in that single word...
I did a small experiment and was slightly surprised by the result though. I initialized myvar
with __device__ float myvar(1.1f);
and altered the printf()
so that it prints both the value and the address of the variable. Then I ran it, got 1.1f
output and noted the address. Then I removed the initialization and ran it again. This time, the value went back to 0.0f
while the address stayed the same, showing that the chunk of memory in which this variable is located does get zeroed out as part of regular CUDA operations. For instance, this could happen if the CUDA program is copied to the GPU within a fixed size chunk in which the other data is zero, and myvar
is assigned to an address within this chunk.
Upvotes: 2