Reputation:
In the following code I have an array in a struct which I need to pass to the kernel function. I can't seem to find the proper way. I tried looking at other posts on SO but do not understand their methods that well.
In my actual code, I receive two structs, as pointers, as arguments to the function from where the kernel is being called. Therefore, I need to copy the contents of these 'argument structs' to 'GPU memory structs' and pass them onto the kernel.
#include <stdio.h>
#include <stdlib.h>
typedef struct{
int *pass;
int element;
}Pass;
__global__ void hello(int *a, int *b, Pass *p){
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i < *b)
a[i] = p -> pass[i] + p -> element;
}
int main(){
int *a_host, b_host = 5;
int *a_gpu, *b_gpu;
Pass *p, *p_gpu;
a_host = (int*)malloc(sizeof(int) * 5);
cudaMalloc(&a_gpu, 5 * sizeof(int));
cudaMalloc(&b_gpu, sizeof(int));
cudaMemcpy(b_gpu, &b_host, sizeof(int), cudaMemcpyHostToDevice);
p = (Pass*)malloc(sizeof(Pass));
p -> pass = (int*)malloc(5 * sizeof(int));
for(int i = 0;i < 5;i++)
p -> pass[i] = i;
p -> element = 5;
cudaMalloc(&p_gpu, sizeof(Pass));
cudaMemcpy(p_gpu, p, sizeof(Pass), cudaMemcpyHostToDevice);
int numBlocks = 1;
int threadPerBlock = 512;
hello<<<numBlocks, threadPerBlock>>>(a_gpu, b_gpu, p_gpu);
cudaMemcpy(a_host, a_gpu, 5 * sizeof(int), cudaMemcpyDeviceToHost);
int i;
for(i = 0;i < 5;i++)
printf("a[%d]: %d\n", i, a_host[i]);
cudaFree(p_gpu);
cudaFree(a_gpu);
cudaFree(b_gpu);
free(p);
free(a_host);
return(0);
}
Upvotes: 0
Views: 2944
Reputation: 72348
We have seen three different codes in this question in the first 24 hours of its existence. This answer addresses the final evolution.
The underlying problem you are having is with this type of operation:
cudaMalloc(&p_gpu, sizeof(Pass));
cudaMalloc(&p_gpu -> pass, 5 * sizeof(int));
The second cudaMalloc
is illegal. This is attempting to dereference and assign a value to a pointer in device memory from the host. A segfault will result.
The correct process to allocate a structure on the device which includes pointers to other memory allocations is as follows:
In the code in this question, this might look like this:
// p_hgpu is the device structure copy in host memory
// p_gpu is the device structure copy in device memory
Pass *p_gpu, *p_hgpu;
p_hgpu = (Pass*)malloc(sizeof(Pass));
p_hgpu->element = p->element;
cudaMalloc(&(p_hgpu->pass), sizeof(int) * 5);
cudaMemcpy(p_hgpu->pass, p->pass, sizeof(int) * 5, cudaMemcpyHostToDevice);
// copy p_hgpu to the device
cudaMalloc(&p_gpu, sizeof(Pass));
cudaMemcpy(p_gpu, p_hgpu, sizeof(Pass), cudaMemcpyHostToDevice);
While this is straightforward, it contains several subtleties which are probably only self-evident if you possess a sound grasp of pointers and their use in C++. That might be the missing piece of the puzzle in this case.
Upvotes: 2