user1431515
user1431515

Reputation: 185

cudamemcpy array of pointers where each pointer points to an array

I am attempting to create an array of pointers on the host. Each pointer in the array points to an array of size 4. When I try to copy a pointer to the device, the copy fails and the device cannot access the contents of the array to which the pointer points to. How would I copy a pointer from an array of pointers that points to an array from host to device?

__global__ void kernel(int* D)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    while (tid < 4)
    {
        printf("Device = %d\n", D[tid]);
        tid += blockDim.x * gridDim.x;
    }
}

int main(void)
{
    cudaProfilerStart();

    int* H[2];
    int* D[2]; 
    int test1[4] = { 1, 2, 3, 4 };
    int test2[4] = { 10, 20, 30, 40 };

    H[0] = test1;
    H[1] = test2;

    HANDLE_ERROR(cudaMalloc((void**)&D[0], 4 * sizeof(int)));
    HANDLE_ERROR(cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice));
    kernel <<<1, 4 >>>(D[0]);

    cudaProfilerStop();

    return 0;
}

Upvotes: 1

Views: 781

Answers (1)

Nirvedh Meshram
Nirvedh Meshram

Reputation: 469

As talonmies pointed out there is nothing wrong with the Code. However, you will not see the prints in your kernel, the reason being that the kernel call is asynchronous and your process ends before the kernel prints can be executed. A synchronization call will solve this problem here. However, in real code this might not be needed.

#include <iostream>
#include <numeric>
#include <stdlib.h>
#include <stdio.h>



__global__ void kernel(int* D)
{
        int tid = threadIdx.x + blockIdx.x * blockDim.x;
        while (tid < 4)
        {
                printf("Device = %d\n", D[tid]);
                tid += blockDim.x * gridDim.x;
        }
}

int main(void)
{
        // cudaProfilerStart();

        int* H[2];
        int* D[2];
        int test1[4] = { 1, 2, 3, 4 };
        int test2[4] = { 10, 20, 30, 40 };

        H[0] = test1;
        H[1] = test2;

        cudaMalloc((void**)&D[0], 4 * sizeof(int));
        cudaMemcpy(D[0], H[0], 4 * sizeof(int), cudaMemcpyHostToDevice);
        kernel <<<1, 1 >>>(D[0]);

        cudaError_t cudaerr1 = cudaDeviceSynchronize();
        if (cudaerr1 != cudaSuccess)
                printf("kernel launch failed with error \"%s\".\n",
                        cudaGetErrorString(cudaerr1));

         //cudaProfilerStop();

        return 0;
}

Upvotes: 1

Related Questions