Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems

Question

I'm trying to port a linear problem to CUDA in order to speed up solving times. I have successfully used cusolverDn to handle dense problems on the GPU. However, when I attempted to apply it to sparse problems using cusolverSpScsrlsvchol, I keep getting a segmentation fault.

To debug the issue, I used the CUDA compute sanitizers and received the following output:

$ /c/Programme/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/bin/compute-sanitizer.bat --tool memcheck bin/FEMaster_gpu.exe
========= COMPUTE-SANITIZER

========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 0 errors
Segmentation fault

I narrowed down the problem to the following minimal code snippet:

cusolverSpHandle_t handle_cusolver_sp;
    cusparseHandle_t   handle_cusparse;

    // loading handles
    cusolverSpCreate(&handle_cusolver_sp);
    cusparseCreate  (&handle_cusparse);

    // get properties
    cudaSetDevice(0);

    // create csr arrays on cpu
    float host_csr_values[4]{1,1,1,1};
    int   host_csr_col_id[4]{0,1,2,3};
    int   host_csr_row_pt[5]{0,1,2,3,4};
    float host_rhs       [4]{0,3,7,1};
    int   host_singular  [1]{0};

    // allocate arrays on the gpu
    float* dev_csr_values;
    int  * dev_csr_col_id;
    int  * dev_csr_row_pt;
    float* dev_rhs;
    int  * dev_singular;

    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_values,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_col_id,4 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &dev_csr_row_pt,5 * sizeof(int  )));
    runtime_assert_cuda(cudaMalloc((void**) &dev_rhs       ,4 * sizeof(float)));
    runtime_assert_cuda(cudaMalloc((void**) &dev_singular  ,1 * sizeof(int  )));

    // move data to gpu
    runtime_assert_cuda(cudaMemcpy(dev_csr_values, host_csr_values, 4 * sizeof(float), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_col_id, host_csr_col_id, 4 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_csr_row_pt, host_csr_row_pt, 5 * sizeof(int  ), cudaMemcpyHostToDevice));
    runtime_assert_cuda(cudaMemcpy(dev_rhs       , host_rhs       , 4 * sizeof(float), cudaMemcpyHostToDevice));

    // create matrix descriptor
    cusparseMatDescr_t descr;
    runtime_assert_cuda(cusparseCreateMatDescr(&descr));
    runtime_assert_cuda(cusparseSetMatType     (descr, CUSPARSE_MATRIX_TYPE_GENERAL));
    runtime_assert_cuda(cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO    ));

    runtime_assert_cuda(cusolverSpScsrlsvchol(handle_cusolver_sp,
                                              4,
                                              4,
                                              descr,
                                              dev_csr_values,
                                              dev_csr_row_pt,
                                              dev_csr_col_id,
                                              dev_rhs,
                                              0,    // tolerance
                                              0,    // reorder
                                              dev_rhs,
                                              dev_singular));

The values I put in there for the sparse matrix is the one for a diagonal matrix.

I removed the memory deallocation, output retrieval, and other similar calls for simplicity. The code seems straightforward, but it results in a segmentation fault. The issue occurs specifically during the call to cusolverSpScsrlsvchol.

I've been stuck on this problem for over a day and I can't figure out why it's not working. Any help would be greatly appreciated!

Homer512 · Accepted Answer

The API states that the singularity parameter is supposed to be in host memory space, not device.

Segmentation fault when using cusolverSpScsrlsvchol in CUDA for sparse linear problems

Answers (1)

Related Questions