Eigenvalues of a large sparse matrix in R

Question

I have some large (symmetric real) sparse matrix, and I would like to calculate a few of the largest and smallest (by magnitude) eigenvalues. I looked into Armadillo, but that links to ARPACK library, which is written in Fortran. Being a Widnows user, I don't have a Fortran compiler handy and so I decided to use ARPACK R package which comes precompiled for Windows.

My matrix is stored in MatrixMarket format, and in R, I read it as follows:

library(Matrix)
f <- file("file://./lambda.mtx", "r")
str(m <- readMM(f))
close(f)
# read sparse matrix

The matrix is symmetric, and only the lower half is actually stored in the mtx file. I have tried using a symmetric matrix, but I got nowhere. So I wrote the following (rather gruesome to me) code to form both halves of the sparse matrix:

str(A <- sparseMatrix(c(m@i, m@j), c(m@j, m@i), index1 = FALSE, x = c(m@x, m@x), giveCsparse = TRUE, use.last.ij = TRUE))
# do not treat it as symmetric, this doubles the memory but avoids
# conversion to a full matrix in eigs() so it is a good deal ...

str(nnzM <- length(m@x))
str(nnzA <- A@p[A@Dim[2] + 1])
if(nnzM * 2 - m@Dim[2] != nnzA) {
    stop("failed to form full matrix A")
}
if(A@x[1] != m@x[1]) {
    if(A@x[1] == 2 * m@x[1])
        warning("failed to form full matrix A: diagonal elements added")
    else
        warning("failed to form full matrix A: bad diagonal value")
}
# make sure the repeated values of the diagonal were eliminated instead of added (use.last.ij = TRUE)

Finally, I calculate Eigenvalues using rARPACK:

library(rARPACK)
str(le <- eigs(A, k = 5, which = "LM", opts = list(retvec = FALSE))) # or dsyMatrix
str(se <- eigs(A, k = 5, sigma = 0, opts = list(retvec = FALSE))) # or dsyMatrix
le$values[1]
se$values[se$nconv]

And this works on small matrices. But on large matrices, I get bad_alloc errors in eigs(). Note that I'm running 64-bit version of R and there is 16 GB of RAM in the box (the sparse matrix in question is 174515 x 174515, with 9363966 nonzeroes and has just over 300 MB in the mtx (text format)).

I had memory issues before when I was using the symmetric sparse matrix. My guess is that the matrix is not in a good format and still gets converted to a dense matrix somewhere. Otherwise I don't see how could it be needing so much memory. Any help would be appreciated, this is the first and only thing I've written in R so far.

Upon request, I can upload the matrix somewhere and share the link. I can calculate the eigenvalues of the same matrix in Matlab, but that's mostly a manual process and I have to transfer the matrix to another machine (also 16 GB of RAM, but the Matlab is 32-bit so in theory it has much more limited working space), and the machine happens to be in Europe while i'm in Australia so it takes forever. I'd much more prefer using R on my local machine, if at all possible.

EDIT

All the data and the script, required to replicate the problem is at https://sourceforge.net/projects/slamfrontend/files/data/eigenvalues/. Also, the process actually does allocate about 12 GB of memory before it dies.

EDIT 2

This is the output of the script. It actually crashes when calculating the smallest eigenvalue, the largest calculates quickly and without a crash:

E:\my-projects\Robotics\MonaschGit\wdir>"C:\Program Files\R\R-3.2.2\bin\x64\R" --no-save -q --slave  0



This might also mean that the problem is related to the shift and invert mode used for the smallest eigenvalue. Does it actually invert the matrix in the process? If so, I can see why it runs out of memory, inverse of such matrix would be completely dense.

the swine · Accepted Answer

After taking the discussion with the developer of the rARPACK package, it became clear that the problem is not in the matrix being converted to dense, but rather in the LU factorization having to significantly change the ordering of the sparse matrix to avoid numerical problems, and hence filling the matrix in considerably.

The new version of the package will use the LDLT factorization, which does not seem to suffer from this problem, and calculates the eigenvalues quickly.

Until the new version is released, it is possible to grab a copy of the C++ sources and use them with:

class CSymmetricSparseMatrix_InvProduct {
protected:
    typedef Eigen::SparseMatrix SpMat;
    typedef Eigen::SimplicialLDLT SpLDLTSolver;

    const size_t n;
    SpMat m_mat;
    SpLDLTSolver solver;

public:
    CSymmetricSparseMatrix_InvProduct(const cs *p_matrix)
        :m_mat(int(p_matrix->m), int(p_matrix->n)), n(p_matrix->n)
    {
        // could maybe use Eigen::internal::viewAsEigen(p_matrix);

        _ASSERTE(p_matrix->m == p_matrix->n);
        // must be square

        std::vector > triplets;
        triplets.reserve(p_matrix->p[p_matrix->n]);
        for(size_t i = 0; i < n; ++ i) {
            for(size_t p = p_matrix->p[i], e = p_matrix->p[i + 1]; p < e; ++ p) {
                double f_val = p_matrix->x[p];
                size_t n_row = p_matrix->i[p], n_col = i;
                triplets.push_back(Eigen::Triplet(int(n_row), int(n_col), f_val));
            }
        }
        // expand the CSparse matrix to triplets

        m_mat.setFromTriplets(triplets.begin(), triplets.end());
        // fill the Eigen matrix

        m_mat.makeCompressed();
        // compress the Eigen matrix
    }

    size_t rows() const
    {
        return n;
    }

    size_t cols() const
    {
        return n;
    }

    void set_shift(double sigma)
    {
        SpMat I((int)n, (int)n);
        I.setIdentity();

        solver.compute(m_mat - I * sigma); // also better use solver.setShift(-sigma, 1.0);

        /*size_t n_fac_nnz = ((SpMat)solver.matrixL()).nonZeros(); // type cast required, otherwise it loops forever
        fprintf(stderr, "debug: the LDLT factorization has " PRIsize " nonzeros
", n_fac_nnz);*/
    }

    // y_out = inv(A - sigma * I) * x_in
    void perform_op(const double *x_in, double *y_out) const
    {
        Eigen::Map x(x_in, n);
        Eigen::Map y(y_out, n);
        y.noalias() = solver.solve(x);
    }

private:
    CSymmetricSparseMatrix_InvProduct(const CSymmetricSparseMatrix_InvProduct &r_other); // no-copy
    CSymmetricSparseMatrix_InvProduct &operator =(const CSymmetricSparseMatrix_InvProduct &r_other); // no-copy
};

Eigenvalues of a large sparse matrix in R

Answers (1)

Related Questions