Blake
Blake

Reputation: 75

Variable Sizes Array in CUDA

Is there any way to declare an array such as:

int arraySize = 10;
int array[arraySize];

inside a CUDA kernel/function? I read in another post that I could declare the size of the shared memory in the kernel call and then I would be able to do:

int array[];

But I cannot do this. I get a compile error: "incomplete type is not allowed". On a side note, I've also read that printf() can be called from within a thread and this also throws an error: "Cannot call host function from inside device/global function".

Is there anything I can do to make a variable sized array or equivalent inside CUDA? I am at compute capability 1.1, does this have anything to do with it? Can I get around the variable size array declarations from within a thread by defining a typedef struct which has a size variable I can set? Solutions for compute capabilities besides 1.1 are welcome. This is for a class team project and if there is at least some way to do it I can at least present that information.

Upvotes: 1

Views: 3698

Answers (2)

Vlad
Vlad

Reputation: 18633

If your arrays can be large, one solution would be to have one kernel that computes the required array sizes, stores them in an array, then after that invocation, the host allocates the necessary arrays and passes an array of pointers to the threads, and then you run your computation as a second kernel.

Whether this helps depends on what you have to do, because it would be arrays allocated in global memory. If the total size (per block) of your arrays is less than the size of the available shared memory, you could have a sufficiently-large shared memory array and let your threads negociate splitting it amongst themselves.

Upvotes: 0

jmsu
jmsu

Reputation: 2053

About the printf, the problem is it only works for compute capability 2.x. There is an alternative cuPrintf that you might try.

For the allocation of variable size arrays in CUDA you do it like this:

  • Inside the kernel you write extern __shared__ int[];
  • On the kernel call you pass as the third launch parameter the shared memory size in bytes like mykernel<<<gridsize, blocksize, sharedmemsize>>>();

This is explained in the CUDA C programming guide in section B.2.3 about the __shared__ qualifier.

Upvotes: 2

Related Questions