Reputation: 945
I'm trying to port a linear problem to CUDA in order to speed up solving times. I have successfully used cusolverDn to handle dense problems on the GPU. However, when I attempted to apply it to sparse problems using cusolverSpScsrlsvchol, I keep getting a segmentation fault.
To debug the issue, I used the CUDA compute sanitizers and received the following output:
$ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
========= COMPUTE-SANITIZER
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 0 errors
Segmentation fault
I narrowed down the problem to the following minimal code snippet:
cusolverSpHandle_t handle_cusolver_sp;
cusparseHandle_t handle_cusparse;
// loading handles
cusolverSpCreate(&handle_cusolver_sp);
cusparseCreate (&handle_cusparse);
// get properties
cudaSetDevice(0);
// create csr arrays on cpu
float host_csr_values[4]{1,1,1,1};
int host_csr_col_id[4]{0,1,2,3};
int host_csr_row_pt[5]{0,1,2,3,4};
float host_rhs [4]{0,3,7,1};
int host_singular [1]{0};
// allocate arrays on the gpu
float* dev_csr_values;
int * dev_csr_col_id;
int * dev_csr_row_pt;
float* dev_rhs;
int * dev_singular;
runtime_assert_cuda(cudaMalloc((void**) &dev_csr_values,4 * sizeof(float)));
runtime_assert_cuda(cudaMalloc((void**) &dev_csr_col_id,4 * sizeof(int )));
runtime_assert_cuda(cudaMalloc((void**) &dev_csr_row_pt,5 * sizeof(int )));
runtime_assert_cuda(cudaMalloc((void**) &dev_rhs ,4 * sizeof(float)));
runtime_assert_cuda(cudaMalloc((void**) &dev_singular ,1 * sizeof(int )));
// move data to gpu
runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int ), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int ), cudaMemcpyHostToDevice));
runtime_assert_cuda(cudaMemcpy(dev_rhs , host_rhs , 4 * sizeof(float), cudaMemcpyHostToDevice));
// create matrix descriptor
cusparseMatDescr_t descr;
runtime_assert_cuda(cusparseCreateMatDescr(&descr));
runtime_assert_cuda(cusparseSetMatType (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO ));
runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
4,
4,
descr,
dev_csr_values,
dev_csr_row_pt,
dev_csr_col_id,
dev_rhs,
0, // tolerance
0, // reorder
dev_rhs,
dev_singular));
The values I put in there for the sparse matrix is the one for a diagonal matrix.
I removed the memory deallocation, output retrieval, and other similar calls for simplicity. The code seems straightforward, but it results in a segmentation fault. The issue occurs specifically during the call to cusolverSpScsrlsvchol.
I've been stuck on this problem for over a day and I can't figure out why it's not working. Any help would be greatly appreciated!
Upvotes: 1
Views: 114