user4110632
user4110632

Reputation:

How to pass struct containing array to the kernel in CUDA?

In the following code I have an array in a struct which I need to pass to the kernel function. I can't seem to find the proper way. I tried looking at other posts on SO but do not understand their methods that well.

In my actual code, I receive two structs, as pointers, as arguments to the function from where the kernel is being called. Therefore, I need to copy the contents of these 'argument structs' to 'GPU memory structs' and pass them onto the kernel.

#include <stdio.h>
#include <stdlib.h>

typedef struct{
    int *pass;
    int element;
}Pass;

__global__ void hello(int *a, int *b, Pass *p){

    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if(i < *b)
        a[i] = p -> pass[i] + p -> element;
}

int main(){

    int *a_host, b_host = 5;
    int *a_gpu, *b_gpu;
    Pass *p, *p_gpu;

    a_host = (int*)malloc(sizeof(int) * 5); 
    cudaMalloc(&a_gpu, 5 * sizeof(int));

    cudaMalloc(&b_gpu, sizeof(int));
    cudaMemcpy(b_gpu, &b_host, sizeof(int), cudaMemcpyHostToDevice);

    p = (Pass*)malloc(sizeof(Pass));
    p -> pass = (int*)malloc(5 * sizeof(int));

    for(int i = 0;i < 5;i++)
        p -> pass[i] = i;
    p -> element = 5;

    cudaMalloc(&p_gpu, sizeof(Pass));
    cudaMemcpy(p_gpu, p, sizeof(Pass), cudaMemcpyHostToDevice);

    int numBlocks = 1;
    int threadPerBlock = 512;


    hello<<<numBlocks, threadPerBlock>>>(a_gpu, b_gpu, p_gpu);

    cudaMemcpy(a_host, a_gpu, 5 * sizeof(int), cudaMemcpyDeviceToHost);

    int i;
    for(i = 0;i < 5;i++)
        printf("a[%d]: %d\n", i, a_host[i]);

    cudaFree(p_gpu);
    cudaFree(a_gpu);
    cudaFree(b_gpu);

    free(p);
    free(a_host);

    return(0);
}

Upvotes: 0

Views: 2944

Answers (1)

talonmies
talonmies

Reputation: 72348

We have seen three different codes in this question in the first 24 hours of its existence. This answer addresses the final evolution.

The underlying problem you are having is with this type of operation:

cudaMalloc(&p_gpu, sizeof(Pass));
cudaMalloc(&p_gpu -> pass, 5 * sizeof(int));

The second cudaMalloc is illegal. This is attempting to dereference and assign a value to a pointer in device memory from the host. A segfault will result.

The correct process to allocate a structure on the device which includes pointers to other memory allocations is as follows:

  1. Allocate memory for each of the arrays or objects which the structure pointers will point to on the device
  2. Assign those allocations to a copy of the structure in host memory
  3. Copy the full initialised structure in host memory to device memory

In the code in this question, this might look like this:

// p_hgpu is the device structure copy in host memory
// p_gpu is the device structure copy in device memory
Pass *p_gpu, *p_hgpu;
p_hgpu = (Pass*)malloc(sizeof(Pass));
p_hgpu->element = p->element;
cudaMalloc(&(p_hgpu->pass), sizeof(int) * 5);
cudaMemcpy(p_hgpu->pass, p->pass, sizeof(int) * 5, cudaMemcpyHostToDevice);

// copy p_hgpu to the device
cudaMalloc(&p_gpu, sizeof(Pass));
cudaMemcpy(p_gpu, p_hgpu, sizeof(Pass), cudaMemcpyHostToDevice);

While this is straightforward, it contains several subtleties which are probably only self-evident if you possess a sound grasp of pointers and their use in C++. That might be the missing piece of the puzzle in this case.

Upvotes: 2

Related Questions