CUDA copying an array of arrays filled with data, from host to device

Question

i've been looking for a way to transfer a filled array of arrays from host to device in CUDA.

What i have:

A global array of arrays that is filled with data, that i need to copy to the device for kernel execution.
The arrays in the array have different lengths.

I have a function to initiate the array and it's values:

double** weights; // globally defined in host
int init_weigths(){
    weights = (double**) malloc(sizeof(double*) * SIZE);

    for (int i = 0; i < SIZE; i++) {
        weights[i] = (double*) malloc(sizeof(double) * getSize(i));

        for (int j = 0; j < getSize(i); j++){
            weights[i][j] = get_value(i,j);
        }
    }
}

My (not working) solution:

I've designed a solution gathering information of other answers found in the Internet, but no one worked. I think it's because of the difference that my array of arrays is already filled up with information, and of the variable lengths of the contained arrays.

The solution i have, that is throwing "invalid argument" error in all cudaMemcpy calls, and in the second and further cudaMalloc calls; checked by cudaGetLastError(). The solution is this one:

double** d_weights;
int init_cuda_weight(){
    cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);

    double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
    // temp array of device pointers
    for (int i = 0; i < SIZE; i++){
        cudaMalloc((void**) &temp_d_ptrs[getSize(i)],
                sizeof(double) * getSize(i));
        // ERROR CHECK WITH cudaGetLastError(); doesn't throw any errors ar first.
        cudaMemcpy(temp_d_ptrs[getSize(i)], weights[getSize(i)], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
        // ERROR CHECK WITH cudaGetLastError(); throw "invalid argument" error for now and beyond.
    }

   cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
        cudaMemcpyHostToDevice);
}

As aditional information, i've simplified the code a bit. The arrays contained in the array of arrays have different lengths (i.e. SIZE2 isn't constant), thats why i'm not flattening to an 1D array.

What is wrong with this implementation? Any ideas to achieve the copy?

Edit2: The original code i wrote was OK. I edited the code to include the error i had and included the correct answer (code) below.

Vichoko · Accepted Answer

The mistake is that i used the array total size getSize(i) as the index of the allocations and copies. It was a naive error hidden by the complexity and verbosity of the real code.

The correct solution is:

double** d_weights;
int init_cuda_weight(){
    cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);

    double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
    // temp array of device pointers
    for (int i = 0; i < SIZE; i++){
        cudaMalloc((void**) &temp_d_ptrs[i],
                sizeof(double) * getSize(i));
        // ERROR CHECK WITH cudaGetLastError()
        cudaMemcpy(temp_d_ptrs[i], weights[i], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
        // ERROR CHECK WITH cudaGetLastError()
    }

   cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
        cudaMemcpyHostToDevice);
}

CUDA copying an array of arrays filled with data, from host to device

Answers (1)

Related Questions