BugShotGG
BugShotGG

Reputation: 5190

How to allocate Array of Pointers and preserve them for multiple kernel calls in cuda

I am trying to implement an algorithm in cuda and I need to allocate an Array of Pointers that point to an Array of Structs. My struct is, lets say:

    typedef struct {
       float x, y; 
    } point;

I know that If I want to preserve the arrays for multiple kernel calls I have to control them from the host, is that right? The initialization of the pointers must be done from within the kernel. To be more specific, the Array of Struct P will contain random order of cartesian points while the dev_S_x will be a sorted version as to x coordinate of the points in P.

I have tried with:

__global__ void test( point *dev_P, point **dev_S_x) {
    unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;

    dev_P[tid].x = 3.141516;
    dev_P[tid].y = 3.141516;
    dev_S_x[tid] = &dev_P[tid];
   ...
}

and:

 int main( void ) {
     point *P, *dev_P, **S_x, *dev_S_x;
     P   = (point*)  malloc (N * sizeof (point) );
     S_x = (point**) malloc (N * sizeof (point*));

     // allocate the memory on the GPU
     cudaMalloc( (void**)  &dev_P,   N * sizeof(point) );
     cudaMalloc( (void***)  &dev_S_x, N * sizeof(point*));

     // copy the array P to the GPU
     cudaMemcpy( dev_P, P,  N * sizeof(point),  cudaMemcpyHostToDevice);
     cudaMemcpy( dev_S_x,S_x,N * sizeof(point*), cudaMemcpyHostToDevice);

     test <<<1, 1 >>>( dev_P, &dev_S_x);
        ...
     return 0;
}

which leads to many

First-chance exception at 0x000007fefcc89e5d (KernelBase.dll) in Test_project_cuda.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0020f920.. Critical error detected c0000374

Am I doing something wrong in the cudamalloc of the array of pointers or is it something else? Is the usage of (void***) correct? I would like to use for example dev_S_x[tid]->x or dev_S_x[tid]->y from within the kernels pointing to device memory addresses. Is that feasible? Thanks in advance

Upvotes: 1

Views: 375

Answers (1)

Tom
Tom

Reputation: 21138

dev_S_x should be declared as point ** and should be passed to the kernel as a value (i.e. test <<<1, 1 >>>(dev_P, dev_S_x);).

Putting that to one side, what you describe sounds like a natural fit for Thrust, which will give you a simpler memory management strategy and access to fast sort routines.

Upvotes: 1

Related Questions