Reputation: 11031
I have some large (symmetric real) sparse matrix, and I would like to calculate a few of the largest and smallest (by magnitude) eigenvalues. I looked into Armadillo, but that links to ARPACK library, which is written in Fortran. Being a Widnows user, I don't have a Fortran compiler handy and so I decided to use ARPACK R package which comes precompiled for Windows.
My matrix is stored in MatrixMarket format, and in R, I read it as follows:
library(Matrix)
f <- file("file://./lambda.mtx", "r")
str(m <- readMM(f))
close(f)
# read sparse matrix
The matrix is symmetric, and only the lower half is actually stored in the mtx file. I have tried using a symmetric matrix, but I got nowhere. So I wrote the following (rather gruesome to me) code to form both halves of the sparse matrix:
str(A <- sparseMatrix(c(m@i, m@j), c(m@j, m@i), index1 = FALSE, x = c(m@x, m@x), giveCsparse = TRUE, use.last.ij = TRUE))
# do not treat it as symmetric, this doubles the memory but avoids
# conversion to a full matrix in eigs() so it is a good deal ...
str(nnzM <- length(m@x))
str(nnzA <- A@p[A@Dim[2] + 1])
if(nnzM * 2 - m@Dim[2] != nnzA) {
stop("failed to form full matrix A")
}
if(A@x[1] != m@x[1]) {
if(A@x[1] == 2 * m@x[1])
warning("failed to form full matrix A: diagonal elements added")
else
warning("failed to form full matrix A: bad diagonal value")
}
# make sure the repeated values of the diagonal were eliminated instead of added (use.last.ij = TRUE)
Finally, I calculate Eigenvalues using rARPACK:
library(rARPACK)
str(le <- eigs(A, k = 5, which = "LM", opts = list(retvec = FALSE))) # or dsyMatrix
str(se <- eigs(A, k = 5, sigma = 0, opts = list(retvec = FALSE))) # or dsyMatrix
le$values[1]
se$values[se$nconv]
And this works on small matrices. But on large matrices, I get bad_alloc
errors in eigs()
. Note that I'm running 64-bit version of R and there is 16 GB of RAM in the box (the sparse matrix in question is 174515 x 174515, with 9363966 nonzeroes and has just over 300 MB in the mtx (text format)).
I had memory issues before when I was using the symmetric sparse matrix. My guess is that the matrix is not in a good format and still gets converted to a dense matrix somewhere. Otherwise I don't see how could it be needing so much memory. Any help would be appreciated, this is the first and only thing I've written in R so far.
Upon request, I can upload the matrix somewhere and share the link. I can calculate the eigenvalues of the same matrix in Matlab, but that's mostly a manual process and I have to transfer the matrix to another machine (also 16 GB of RAM, but the Matlab is 32-bit so in theory it has much more limited working space), and the machine happens to be in Europe while i'm in Australia so it takes forever. I'd much more prefer using R on my local machine, if at all possible.
EDIT
All the data and the script, required to replicate the problem is at https://sourceforge.net/projects/slamfrontend/files/data/eigenvalues/. Also, the process actually does allocate about 12 GB of memory before it dies.
EDIT 2
This is the output of the script. It actually crashes when calculating the smallest eigenvalue, the largest calculates quickly and without a crash:
E:\my-projects\Robotics\MonaschGit\wdir>"C:\Program Files\R\R-3.2.2\bin\x64\R" --no-save -q --slave 0<get_lambda_eigenvalues.R
Formal class 'dsTMatrix' [package "Matrix"] with 7 slots
..@ i : int [1:9363966] 0 1 2 3 4 5 6 1 2 3 ...
..@ j : int [1:9363966] 0 0 0 0 0 0 0 1 1 1 ...
..@ Dim : int [1:2] 174515 174515
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:9363966] 1.62e+08 -2.49e+01 -2.49e+06 2.19e+07 1.79e+09 ...
..@ uplo : chr "L"
..@ factors : list()
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:18553417] 0 1 2 3 4 5 6 7 8 9 ...
..@ p : int [1:174516] 0 14362 28724 43086 57448 71810 86172 100534 115525 130516 ...
..@ Dim : int [1:2] 174515 174515
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ x : num [1:18553417] 1.62e+08 -2.49e+01 -2.49e+06 2.19e+07 1.79e+09 ...
..@ factors : list()
int 9363966
int 18553417
List of 4
$ values : num [1:5] 1.38e+11 1.31e+11 1.30e+11 1.22e+11 1.16e+11
$ vectors: NULL
$ nconv : int 5
$ niter : int 60
Error: std::bad_alloc
Execution halted
This might also mean that the problem is related to the shift and invert mode used for the smallest eigenvalue. Does it actually invert the matrix in the process? If so, I can see why it runs out of memory, inverse of such matrix would be completely dense.
Upvotes: 2
Views: 1468
Reputation: 11031
After taking the discussion with the developer of the rARPACK package, it became clear that the problem is not in the matrix being converted to dense, but rather in the LU factorization having to significantly change the ordering of the sparse matrix to avoid numerical problems, and hence filling the matrix in considerably.
The new version of the package will use the LDLT factorization, which does not seem to suffer from this problem, and calculates the eigenvalues quickly.
Until the new version is released, it is possible to grab a copy of the C++ sources and use them with:
class CSymmetricSparseMatrix_InvProduct {
protected:
typedef Eigen::SparseMatrix<double, Eigen::ColMajor> SpMat;
typedef Eigen::SimplicialLDLT<SpMat> SpLDLTSolver;
const size_t n;
SpMat m_mat;
SpLDLTSolver solver;
public:
CSymmetricSparseMatrix_InvProduct(const cs *p_matrix)
:m_mat(int(p_matrix->m), int(p_matrix->n)), n(p_matrix->n)
{
// could maybe use Eigen::internal::viewAsEigen(p_matrix);
_ASSERTE(p_matrix->m == p_matrix->n);
// must be square
std::vector<Eigen::Triplet<double> > triplets;
triplets.reserve(p_matrix->p[p_matrix->n]);
for(size_t i = 0; i < n; ++ i) {
for(size_t p = p_matrix->p[i], e = p_matrix->p[i + 1]; p < e; ++ p) {
double f_val = p_matrix->x[p];
size_t n_row = p_matrix->i[p], n_col = i;
triplets.push_back(Eigen::Triplet<double>(int(n_row), int(n_col), f_val));
}
}
// expand the CSparse matrix to triplets
m_mat.setFromTriplets(triplets.begin(), triplets.end());
// fill the Eigen matrix
m_mat.makeCompressed();
// compress the Eigen matrix
}
size_t rows() const
{
return n;
}
size_t cols() const
{
return n;
}
void set_shift(double sigma)
{
SpMat I((int)n, (int)n);
I.setIdentity();
solver.compute(m_mat - I * sigma); // also better use solver.setShift(-sigma, 1.0);
/*size_t n_fac_nnz = ((SpMat)solver.matrixL()).nonZeros(); // type cast required, otherwise it loops forever
fprintf(stderr, "debug: the LDLT factorization has " PRIsize " nonzeros\n", n_fac_nnz);*/
}
// y_out = inv(A - sigma * I) * x_in
void perform_op(const double *x_in, double *y_out) const
{
Eigen::Map<const Eigen::VectorXd, Eigen::DontAlign> x(x_in, n);
Eigen::Map<Eigen::VectorXd, Eigen::DontAlign> y(y_out, n);
y.noalias() = solver.solve(x);
}
private:
CSymmetricSparseMatrix_InvProduct(const CSymmetricSparseMatrix_InvProduct &r_other); // no-copy
CSymmetricSparseMatrix_InvProduct &operator =(const CSymmetricSparseMatrix_InvProduct &r_other); // no-copy
};
Upvotes: 1