Reputation: 4456
I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/
hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.
So I thought use aov instead of lm would work for me like this:
aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
However, I got this error:
Error in terms.default(formula, "Error", data = data) :
no terms component nor attribute
I have no idea of where the error comes from. Please help!
Upvotes: 6
Views: 629
Reputation: 206486
The problem is that substitute()
returns an expression, not a formula. I think @thelatemail's suggestion of
lm(as.formula(paste("read ~",x)), data = hsb2)
is a good work around. Alternatively you could evaluate the expression to get the formula with
models <- lapply(varlist, function(x) {
aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})
I guess it depends on what you want to do with the list of models afterward. Doing
models <- lapply(varlist, function(x) {
eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})
gives a "cleaner" call
property for each of the result.
Upvotes: 6
Reputation: 99361
akrun borrowed my answer the other night, now I'm (partially) borrowing his.
do.call
puts the variables into the call
output so it reads properly. Here's a general function for simple regression.
doModel <- function(col1, col2, data = hsb2, FUNC = "lm")
{
form <- as.formula(paste(col1, "~", col2))
do.call(FUNC, list(form, substitute(data)))
}
lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept) write
# 18.1622 0.6455
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept) math
# 14.0725 0.7248
#
# ...
# ...
# ...
Note: As thelatemail mentions in his comment
sapply(varlist, doModel, col1 = "read", simplify = FALSE)
will keep the names in the list and also allow list$name
subsetting.
Upvotes: 4
Reputation: 263451
This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:
models <- sapply(varlist, function(x) {
lm(read ~ ., data = hsb2[, c("read", x) ])
}, simplify=FALSE)
> summary(models[[1]]) # The first model. Note the use of "[["
Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])
Residuals:
Min 1Q Median 3Q Max
-19.8565 -5.8976 -0.8565 5.5801 24.2703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.16215 3.30716 5.492 1.21e-07 ***
write 0.64553 0.06168 10.465 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529
F-statistic: 109.5 on 1 and 198 DF, p-value: < 2.2e-16
For all the models::
lapply(models, summary)
Upvotes: 5