Reputation: 61
Currently I'm just trying to implement simple Linear Regression algorithm in matrix-form based on cuBLAS with CUDA. Matrix multiplication and transposition works well with cublasSgemm
function.
Problems begins with matrix inversions, based on cublas<t>getrfBatched()
and cublas<t>getriBatched()
functions (see here).
As it can be seen, input parameters of these functions - arrays of pointers to matrices. Imagine, that I've already allocated memory for (A^T * A) matrix on GPU as a result of previous calculations:
float* dProdATA;
cudaStat = cudaMalloc((void **)&dProdATA, n*n*sizeof(*dProdATA));
Is it possible to run factorization (inversion)
cublasSgetrfBatched(handle, n, &dProdATA, lda, P, INFO, mybatch);
without additional HOST <-> GPU memory copying (see working example of inverting array of matrices) and allocating arrays with single element, but just get GPU-reference to GPU-pointer?
Upvotes: 1
Views: 580
Reputation: 72349
There is no way around the requirement that the array you pass being in the device address space, and what you posted in your question won't work. You really only have two possibilities:
In the latter case with managed memory, something like this should work (completely untested, use at own risk):
float ** batch;
cudaMallocManaged((&batch, sizeof(float *));
*batch = dProdATA;
cublasSgetrfBatched(handle, n, batch, lda, P, INFO, mybatch);
Upvotes: 1