Reputation: 373
i've been looking for a way to transfer a filled array of arrays from host to device in CUDA.
What i have:
I have a function to initiate the array and it's values:
double** weights; // globally defined in host
int init_weigths(){
weights = (double**) malloc(sizeof(double*) * SIZE);
for (int i = 0; i < SIZE; i++) {
weights[i] = (double*) malloc(sizeof(double) * getSize(i));
for (int j = 0; j < getSize(i); j++){
weights[i][j] = get_value(i,j);
}
}
}
My (not working) solution:
I've designed a solution gathering information of other answers found in the Internet, but no one worked. I think it's because of the difference that my array of arrays is already filled up with information, and of the variable lengths of the contained arrays.
The solution i have, that is throwing "invalid argument" error in all cudaMemcpy
calls, and in the second and further cudaMalloc
calls; checked by cudaGetLastError()
.
The solution is this one:
double** d_weights;
int init_cuda_weight(){
cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);
double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
// temp array of device pointers
for (int i = 0; i < SIZE; i++){
cudaMalloc((void**) &temp_d_ptrs[getSize(i)],
sizeof(double) * getSize(i));
// ERROR CHECK WITH cudaGetLastError(); doesn't throw any errors ar first.
cudaMemcpy(temp_d_ptrs[getSize(i)], weights[getSize(i)], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
// ERROR CHECK WITH cudaGetLastError(); throw "invalid argument" error for now and beyond.
}
cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
cudaMemcpyHostToDevice);
}
As aditional information, i've simplified the code a bit. The arrays contained in the array of arrays have different lengths (i.e. SIZE2 isn't constant), thats why i'm not flattening to an 1D array.
What is wrong with this implementation? Any ideas to achieve the copy?
Edit2: The original code i wrote was OK. I edited the code to include the error i had and included the correct answer (code) below.
Upvotes: 0
Views: 1200
Reputation: 373
The mistake is that i used the array total size getSize(i)
as the index of the allocations and copies. It was a naive error hidden by the complexity and verbosity of the real code.
The correct solution is:
double** d_weights;
int init_cuda_weight(){
cudaMalloc((void **) &d_weights, sizeof(double*) * SIZE);
double** temp_d_ptrs = (double**) malloc(sizeof(double*) * SIZE);
// temp array of device pointers
for (int i = 0; i < SIZE; i++){
cudaMalloc((void**) &temp_d_ptrs[i],
sizeof(double) * getSize(i));
// ERROR CHECK WITH cudaGetLastError()
cudaMemcpy(temp_d_ptrs[i], weights[i], sizeof(double) * getSize(i), cudaMemcpyHostToDevice);
// ERROR CHECK WITH cudaGetLastError()
}
cudaMemcpy(d_weights, temp_d_ptrs, sizeof(double*) * SIZE,
cudaMemcpyHostToDevice);
}
Upvotes: 2