SenfMeister
SenfMeister

Reputation: 63

define variable size on array in local memory, using CUDA

Is it somewhat possible to make a list, array, something in a device function with the size of the list/array beeing a parameter in the call… or a global variable that's initialized at call time?

I would like something like one of these list to work:

unsigned int size1;

__device__ void function(int size2) {

    int list1[size1];
    int list2[size2];
}

Is it possible to do something smart to make something like this work?

Upvotes: 6

Views: 3784

Answers (3)

KoppeKTop
KoppeKTop

Reputation: 648

There is 1 way to allocate dynamic amount of shared memory - to use third launch kernel parameter:

__global__ void kernel (int * arr) 
{
    extern __shared__ int buf []; // size is not stated
    // copy data to shared mem:
    buf[threadIdx.x] = arr[blockIdx.x * blockDim.x + threadIdx.x];
    // . . . 
}
// . . . 
// launch kernel, set size of shared mem in bytes (k elements in buf):
kernel<<<grid, threads, k * sizeof(int)>>> (arr);

There is a hack for many arrays:

__device__ void function(int * a, int * b, int k) // k elements in first list
{
    extern __shared__ int list1 [];
    extern __shared__ int list2 []; // list2 points to the same point as list1 does

    list1 [threadIdx.x] = a[blockIdx.x * blockDim.x + threadIdx.x];
    list2 [k + threadIdx.x] = b[blockIdx.x * blockDim.x + threadIdx.x];
    // . . .
}

You must take into account: memory allocated to all block.

Upvotes: 4

karlphillip
karlphillip

Reputation: 93410

Of course it is possible!

Take a look in the source-code of project: http://code.google.com/p/cuda-grayscale/

This function is called from main() and performs grayscale conversion on gpu_image based on it's width and height: cuda_grayscale(gpu_image, width, height, grid, block);

If you dig a little, you'll find the implementation in kernel_gpu.cu:

__global__ void grayscale(float4* imagem, int width, int height)
{
    const int i = blockIdx.x * (blockDim.x * blockDim.y) + blockDim.x * threadIdx.y + threadIdx.x;

    if (i < width * height)
    {
        float v = 0.3 * imagem[i].x + 0.6 * imagem[i].y + 0.1 * imagem[i].z;
        imagem[i] = make_float4(v, v, v, 0);
    }
}

Upvotes: -1

Anycorn
Anycorn

Reputation: 51465

if you know what values of size you can expect, consider using C++ templates. Together with boost preprocessor you can easily generate multiple instances/entry points.

the other thing you can do is dynamically allocate shared memory and assign pointers manually. Obviously this may not work if you require thread private memory in excess of shared memory

I can provide you with a link if you would like to see example

Upvotes: 0

Related Questions