Jeroen Baert
Jeroen Baert

Reputation: 1314

Cuda global __device__ variable auto initialization

I'm declaring a global variable myvar on the device using the __device__ specifier. I don't set it to a meaningful value anywhere (not using cudaMemcpyToSymbol in my kernel launch method, as you would normally do).

I'd expect the value of myvar to be random garbage, but it's neatly 0.0 every time. Does CUDA do auto-initialisation of device variables?

I've checked it using the CUDA debugger also, the value is effectively 0.

__device__ float myvar;

__global__ void kernel(){
    printf("my var: %f", myvar);
}

int kernel_launch(){
    kernel<<<1,5>>>();
    cudaDeviceSynchronize();
   return 0;
}

Upvotes: 0

Views: 2458

Answers (2)

Lorenzo Pistone
Lorenzo Pistone

Reputation: 5188

__device__ uninitialized variables, much like their global __host__ counterpart, need to be declared in the executable by their size and location in memory. As far as I know, such declarations always need a placeholder value, which unsurprisingly appears to be zero.

This can be checked readily. For example this command disassembles the output of a simple __device__ int a; declaration:

nvcc -o test.o -c -x cu - <<< "__device__ int a;" && cuobjdump -xelf all test.o && nvdisasm *cubin

You'll get the following output:

    .headerflags    @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"


//--------------------- .nv.constant14            --------------------------
    .section    .nv.constant14,"a",@progbits
    .align  4
    .align      8
.nv.constant14:
        /*0000*/    .dword  a


//--------------------- .nv.global                --------------------------
    .section    .nv.global,"aw",@nobits
    .align  4
    .type       a,@object
    .size       a,(.L_1 - a)
a:
.nv.global:
    .zero       4
.L_1:

where you can clearly see the implicit zero initialization.

However, I believe it would be unsafe to rely on this.

Upvotes: 0

Roger Dahl
Roger Dahl

Reputation: 15734

CUDA does not automatically initialize any variables. It's just a CUDA implementation based coincidence that myvar becomes zero in your test app.

In IEEE-754 floating point (used by NVIDIA GPUs), an all zero pattern corresponds to 0.0, so it's a much more likely "random" value than, say, 1.0f.

Don't infer the values of all your GPU memory based on the value in that single word...

I did a small experiment and was slightly surprised by the result though. I initialized myvar with __device__ float myvar(1.1f); and altered the printf() so that it prints both the value and the address of the variable. Then I ran it, got 1.1f output and noted the address. Then I removed the initialization and ran it again. This time, the value went back to 0.0f while the address stayed the same, showing that the chunk of memory in which this variable is located does get zeroed out as part of regular CUDA operations. For instance, this could happen if the CUDA program is copied to the GPU within a fixed size chunk in which the other data is zero, and myvar is assigned to an address within this chunk.

Upvotes: 2

Related Questions