Reputation: 2617
I am running simulations and fitting linear models with .lm.fit()
. Although extremely fast, this function does not provide the predictors' p-values. Is there a fast way to compute them (maybe from the values returned by .lm.fit()
)? I am aware of this method to calculate approximate p-values, but I need the exact ones.
Update:
Dirk Eddelbuettel provided the fastest way to fit the lm and Ben Bolker a method to compute the p-values, by combining both answers we get:
set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)
mdl <- RcppArmadillo::fastLmPure(X, y)
pval <- 2*pt(abs(mdl$coefficients/mdl$stderr), mdl$df.residual, lower.tail=FALSE)
Upvotes: 5
Views: 1735
Reputation: 226192
Dirk's answer will be faster, but in case it's convenient here's the implementation in pure R (extracting the bits you need from summary.lm
, and assuming there are no issues with non-full-rank model matrices etc.)
Example:
set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)
m <- .lm.fit(X,y)
p-value calcs:
rss <- sum(m$residuals^2)
rdf <- length(y) - ncol(X)
resvar <- rss/rdf
R <- chol2inv(m$qr)
se <- sqrt(diag(R) * resvar)
2*pt(abs(m$coef/se),rdf,lower.tail=FALSE)
Compare with:
coef(summary(lm(y~X-1)))[,"Pr(>|t|)"]
Upvotes: 6
Reputation: 368251
For this issue (of getting standard errors and hence p-values) I wrote three different versions of a function fastLm()
in package RcppArmadillo, RcppEigen and RcppGSL. Part of it is of course also just for exposition. But you could start there. Make sure you use the fastLmPure()
variants taking a vector and matrix and not the formula interface -- all the time is spent deparsing the formula.
Here, just for kicks, is the RcppArmadillo variant:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
List fastLm_impl(const arma::mat& X, const arma::colvec& y) {
int n = X.n_rows, k = X.n_cols;
arma::colvec coef = arma::solve(X, y); // fit model y ~ X
arma::colvec res = y - X*coef; // residuals
// std.errors of coefficients
double s2 = std::inner_product(res.begin(), res.end(),
res.begin(), 0.0)/(n - k);
arma::colvec std_err =
arma::sqrt(s2 *
arma::diagvec(arma::pinv(arma::trans(X)*X)));
return List::create(Named("coefficients") = coef,
Named("stderr") = std_err,
Named("df.residual") = n - k);
}
Upvotes: 2