Reputation: 2593
I have a very large matrix (20000 * 20000) and I would like to compute its largest eigenvalue. When I do it in Matlab, it takes couple of seconds, but in R, it takes over an hour to compute. Currently I'm using rARPACK and it takes hours to compute.
library("rARPACK")
eigs_sym(cov(TS), k = 1, which = "LM", opts = list(retvec = FALSE))
Any alternative or solutions ?
Upvotes: 1
Views: 1370
Reputation: 8880
How are you creating your large matrix M? If it is produced as the product of two matrices say M = XX', remember that the first eigenvalues of XX' are equal to the first eigenvalues of X'X. This means that if X has many less rows than columns (or vice-versa) you could compute values from a much smaller matrix!
Code assuming M= XX' and X has 200 rows and 20 columns. Computing the first five eigenvalues from XX' (200 by 200) is around 30 times slower than computing from X'X (20 by 20):
N <- 200
p <- 20
X <- matrix(rnorm(N*p), N, p)
XXt <- tcrossprod(X)
XtX <- crossprod(X)
eigen(XXt)$values[1:5]
#> [1] 313.1307 288.9510 261.9685 260.7008 236.8953
eigen(XtX)$values[1:5]
#> [1] 313.1307 288.9510 261.9685 260.7008 236.8953
microbenchmark::microbenchmark(large_M = eigen(XXt)$values[1:5],
small_M = eigen(XtX)$values[1:5],
times =50,
check="equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> large_M 9178.500 9434.875 10759.3923 10354.18 11416.863 15569.834 50 b
#> small_M 233.601 247.493 325.8577 312.32 373.146 562.687 50 a
Created on 2021-08-27 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 8880
I was also surprised that RSpectra::eigs_sym
was not soooo much faster than base::eigen
. I did a benchmark comparing both over various sample sizes and number of eigenvalues.
In general, RSpectra::eigs_sym
is 3-5 times faster than base::eigen
. But given that base::eigen
will take an eternity for a 20000 * 20000, RSpectra::eigs_sym
still should take a good quarter of eternity!
library(RSpectra)
library(tidyverse)
library(microbenchmark)
library(clusterGeneration)
library(tictoc)
## generate S
S_dat <- tibble(size = c(50, 100, 500, 1000, 2000, 5000)) %>%
mutate(S=map(size, ~genPositiveDefMat(.)$Sigma))
### benvhmark function
bench_oneS <- function(S, n_eigens=5){
dim <- nrow(S)
n_rep <- case_when(dim<=1000 ~20,
TRUE~10)
microbenchmark("base::eigen" = eigen(S)$values[1:n_eigens],
"RSpectra::eigs_sym" =eigs_sym(S,n_eigens, opts = list(retvec = FALSE))$values,
times=n_rep) %>%
summary(a, unit="ms") %>%
as_tibble() %>%
mutate(expr = as.character(expr))
}
## do benchmark
tic()
bench_out <- S_dat %>%
mutate(bench_5=map(S, ~bench_oneS(.)),
bench_1=map(S, ~bench_oneS(., n_eigens=1))) %>%
dplyr::select(-S)%>%
pivot_longer(starts_with("bench"), names_to="n_eigens", values_to="bench",
names_prefix="bench_", names_transform=list(n_eigens=as.integer)) %>%
unnest(bench)
toc()
##
pl_ratio <- bench_out %>%
dplyr::select(size, n_eigens, expr, mean) %>%
spread(expr, mean) %>%
mutate(ratio=`base::eigen` /`RSpectra::eigs_sym`) %>%
ggplot(aes(x=size, y= ratio, linetype=factor(n_eigens))) +
geom_line()+
geom_point()+
labs(color="Method:", linetype="N eigens")+
theme(legend.position="bottom")+
ylab("Ratio time (eigen/eigs_sym)")+
xlab("Sample size")+
ggtitle("Ratio of average timing, base::eigen vs RSpectra::eigs_sym")
Upvotes: 1
Reputation: 356
rARPACK used to be a wrapper around the library ARPACK but is now a wrapper around Spectra an improved and reimplemented version of the same algorithms, which outperforms ARPACK in many tests.
As the matlab eigs
function is also a wrapper around the ARPACK package it seems unlikely that the issue is the solver if you have the same arguments in both cases.
Is it possible that the step taking the time is the calculation of the covariance matrix (i.e. cov(TS)
)?
Upvotes: 2