stevejb
stevejb

Reputation: 2444

What is the best way to run a loop of regressions in R?

Assume that I have sources of data X and Y that are indexable, say matrices. And I want to run a set of independent regressions and store the result. My initial approach would be

results = matrix(nrow=nrow(X), ncol=(2))
for(i in 1:ncol(X)) {
        matrix[i,] = coefficients(lm(Y[i,] ~ X[i,])

}

But, loops are bad, so I could do it with lapply as

out <- lapply(1:nrow(X), function(i) { coefficients(lm(Y[i,] ~ X[i,])) } )

Is there a better way to do this?

Upvotes: 3

Views: 5029

Answers (3)

B. Whitcher
B. Whitcher

Reputation: 366

If you just want to perform straightforward multiple linear regression, then I would recommend not using lm(). There is lsfit(), but I'm not sure it would offer than much of a speed up (I have never performed a formal comparison). Instead I would recommend performing the (X'X)^{-1}X'y using qr() and qrcoef(). This will allow you to perform multivariate multiple linear regression; that is, treating the response variable as a matrix instead of a vector and applying the same regression to each row of observations.

Z # design matrix
Y # matrix of observations (each row is a vector of observations)
## Estimation via multivariate multiple linear regression                    
beta <- qr.coef(qr(Z), Y)
## Fitted values                                                             
Yhat <- Z %*% beta
## Residuals                                                                 
u <- Y - Yhat

In your example, is there a different design matrix per vector of observations? If so, you may be able to modify Z in order to still accommodate this.

Upvotes: 0

JD Long
JD Long

Reputation: 60746

I do this type of thing with plyr, but I agree that it's not a processing efficency issue as much as what you are comfortable reading and writing.

Upvotes: 1

KT.
KT.

Reputation: 11430

You are certainly overoptimizing here. The overhead of a loop is negligible compared to the procedure of model fitting and therefore the simple answer is - use whatever way you find to be the most understandable. I'd go for the for-loop, but lapply is fine too.

Upvotes: 6

Related Questions