Reputation: 175
I would like to run many regression models automatically and test this and save fitted and residuals on the original file.
I mean that I would like to test all possible regression models over the data.
For example, X1=X2+X3...and X2=X1+X3...and X3=X1+X2...
And then add fitted values and residual values of each model.
I have a file like this.
test<-data.frame(X1=rnorm(50,mean=50,sd=10),
X2=rnorm(50,mean=5,sd=1.5),
X3=rnorm(50,mean=200,sd=25))
test$X1[10]<-5
test$X2[10]<-5
test$X3[10]<-530
I run all possible regression models.
varlist <- names(test)
models <- lapply(varlist, function(x) {
lm(substitute(i~., list(i = as.name(x))), data = data
})
I got fitted and residuals from each regression model.
lapply(models,residuals)
lapply(models, fitted)
However, I would like to save all residuals and fitted values on the original data. Is it possible to make the final data like this?
X1 X2 X3 Residual1 Residual2 Residual3 Fitted1 Fitted2 Fitted3
So that residual1
is from model1
, residual2
is from model2
, etc.
Upvotes: 1
Views: 413
Reputation: 18437
I'm sure it's possible to have more compact code but you can try something like this
set.seed(1)
test <- data.frame(X1 = rnorm(50, mean = 50, sd = 10),
X2 = rnorm(50, mean = 5, sd = 1.5),
X3 = rnorm(50, mean = 200, sd = 25))
test$X1[10] <- 5
test$X2[10] <- 5
test$X3[10] <- 530
fitted_list <- lapply(names(test), function(x)
fitted(lm(as.formula(paste(x, ".", sep = "~")),
data = test)))
resid_list <- lapply(names(test), function(x)
resid(lm(as.formula(paste(x, ".", sep = "~")),
data = test)))
res <- do.call(cbind, c(fitted_list, resid_list))
res <- cbind(test, res)
names(res) <- paste0(rep(c("X", "Fitted", "Resid"), each = 3), rep(1:3, 3))
str(res)
## 'data.frame': 50 obs. of 9 variables:
## $ X1 : num 43.7 51.8 41.6 66 53.3 ...
## $ X2 : num 5.6 4.08 5.51 3.31 7.15 ...
## $ X3 : num 184 201 177 204 184 ...
## $ Fitted1: num 52 50.5 52.8 50.3 51.8 ...
## $ Fitted2: num 5.23 5.17 5.25 5.09 5.18 ...
## $ Fitted3: num 219 198 225 161 192 ...
## $ Resid1 : num -8.28 1.35 -11.2 15.64 1.49 ...
## $ Resid2 : num 0.367 -1.09 0.264 -1.788 1.97 ...
## $ Resid3 : num -34.47 2.75 -47.44 43.11 -8.33 ...
Upvotes: 1
Reputation: 952
Unfortunately, your code under "I run all possible regression models" doesn't work properly, but assuming that this is just an example, how about just column binding your rows to the original dataset by saving lapply(models, residuals) and lapply(models, fitted) as variables? And then loop over the number of columns, binding them one from each variable at a time:
models_residuals <- lapply(models,residuals)
models_fitted <- lapply(models, fitted)
for (i in 1:dim(models_residuals)[2])) {
cbind(test, models_residuals[,i])
cbind(test, models_fitted[,i])
}
Let me know if my idea of what you want is correct!
Upvotes: 1