Reputation: 9816
I would like to get the slope of a linear regression fit for 1M separate data sets (1M * 50 rows for data.frame, or 1M * 50 for array). Now I am using the lm()
function, which takes a very long time (about 10 min).
Is there any faster function for linear regression?
Upvotes: 33
Views: 14235
Reputation: 81
lmfit in the package Rfast is even faster than .lm.fit. The only drawback is that it does not work when the design matrix does not have full rank.
Upvotes: 8
Reputation: 6406
Since 3.1.0 there is a .lm.fit()
function. This function should be faster than lm()
and lm.fit()
.
It's described and its performance is compared with different lm
functions here - https://rpubs.com/maechler/fast_lm.
Upvotes: 17
Reputation: 371
speedlm
from speedglm
should do it as it works on large data sets.
Upvotes: 6
Reputation: 368251
Yes there are:
R itself has lm.fit()
which is more bare-bones: no formula notation, much simpler result set
several of our Rcpp-related packages have fastLm()
implementations: RcppArmadillo, RcppEigen, RcppGSL.
We have described fastLm()
in a number of blog posts and presentations. If you want it in the fastest way, do not use the formula interface: parsing the formula and preparing the model matrix takes more time than the actual regression.
That said, if you are regressing a single vector on a single vector you can simplify this as no matrix package is needed.
Upvotes: 31