user8469759
user8469759

Reputation: 2732

Solution of sparse linear system on GPU, paper from nvidia

I am reading through an Nvidia article about solving linear systems (sparse) on GPU. I got stuck on the construction of the chainPtrHost data structure. I understand what it does but I don't understand how it's constructed. It is mentioned that it depends on the parallelism across multiple levels but how that is determined doesn't seem to be explained.

Does anyone know how it can be constructed?

Paper for reference: Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU

Upvotes: 0

Views: 280

Answers (1)

paleonix
paleonix

Reputation: 3048

My understanding is that one just iterates through the levels sequentially on the host like in the following, very basic, untested C++ snippet:

#include <vector>

std::vector<int> create_chains(const std::vector<int> &levels, int threshold) {
    const int num_levels = static_cast<int>(std::size(levels) - 1);
    
    std::vector<int> chains{};
    chains.reserve(std::size(levels)); // optional optimization to avoid re-allocation

    bool no_open_chain = true;

    for (int idx = 0; idx < num_levels; ++idx) {
        const int num_rows = levels[idx + 1] - levels[idx];
        const bool new_single_link_chain = (num_rows > threshold);

        if (no_open_chain || new_single_link_chain) { // new chain starts here
            chains.push_back(idx + 1); // 1-based index
        }

        no_open_chain = new_single_link_chain;
    }
    chains.push_back(num_levels + 1); // 1-based end of last chain

    return chains;
}

The right value for threshold depends on the implementation of the single-block solver kernel. If each thread operates on one row and there is no restriction on the block size for occupancy/shared memory/etc., it could be 1024, i.e. the maximum number of threads (rows) per block.

Upvotes: 3

Related Questions