Dave Durbin
Dave Durbin

Reputation: 3632

How to define constant arrays of dim3 structs in CUDA

I am writing some CUDA code to run on the device. The code will use two lookup tables of constant values. The first of these is an array of 256 unsigned ints and I declare it as :

__constant__ 
uint16_t edgeTable[256]={
   0x000,
   0x019,
   ... etc.
};

And this seems to compile fine.

The second is a fixed size array of dim3 and I tried this:

__constant__
dim3 offsets[8] = {
    {0, 0, 0}, {0, 0, 1}, {0, 1, 0},
    ... etc 
};

Which the compiler objects to. with the error message:

error: dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

Perhaps I misunderstand dynamic initialisation but it seems to me that this is static initialisation, the compiler can work out the sizes of everything and all values are provided.

What am I missing here ?

How can I achieve what I'm trying to do ?

Thanks

I'm using CUDA7.5 toolkit on Ubuntu 14.04 with gcc 4.8.4

Upvotes: 1

Views: 1284

Answers (1)

talonmies
talonmies

Reputation: 72349

The important feature of this problem is that is CUDA uses a C++ compilation model, and dim3 is treated as a class. So while:

dim3 foo = {1,1,1};

is legal in C++11, because of parameterised constructor initialisation support, this:

__constant__ dim3 foo = {1,1,1};

isn't, because that implies dynamic initialisation of a constant memory object, and the CUDA execution model doesn't permit that.

If the constant memory aspect is important to you and you want the convenience of dim3, you could do something like this:

#include <cstdio>

__constant__ int offsets[3*8];

__global__ void kernel()
{
    if (threadIdx.x < 8) {
        dim3 val = *reinterpret_cast<dim3*>(&offsets[3*threadIdx.x]);
        printf("%d (%d,%d,%d)\n", threadIdx.x, val.x, val.y, val.z);
    }
}

void setup_offsets()
{
    // This requires C++11 support
    dim3 ovals[8] = { {0,0,0}, 
                      {1,0,0}, {0,1,0}, {0,0,1},
                      {1,1,0}, {1,0,1}, {0,1,1},
                      {1,1,1} };

    cudaMemcpyToSymbol(offsets, &ovals[0], sizeof(ovals));
}

int main(void)
{
    setup_offsets();
    kernel<<<1,8>>>();
    cudaDeviceSynchronize();
    cudaDeviceReset();
    return 0;
}

which is a bit hacky, but probably the best you can hope for under the circumstances. Looking at PTX for that code, the compiler has correctly emitted ld.const.u32 to fetch each member of the dim3.

Upvotes: 4

Related Questions