olala
olala

Reputation: 4456

substitute in r together with anova

I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/

hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})

My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.

So I thought use aov instead of lm would work for me like this:

aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)

However, I got this error:

Error in terms.default(formula, "Error", data = data) : 
no terms component nor attribute

I have no idea of where the error comes from. Please help!

Upvotes: 6

Views: 629

Answers (3)

MrFlick
MrFlick

Reputation: 206486

The problem is that substitute() returns an expression, not a formula. I think @thelatemail's suggestion of

lm(as.formula(paste("read ~",x)), data = hsb2)

is a good work around. Alternatively you could evaluate the expression to get the formula with

models <- lapply(varlist, function(x) {
    aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})

I guess it depends on what you want to do with the list of models afterward. Doing

models <- lapply(varlist, function(x) {
    eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})

gives a "cleaner" call property for each of the result.

Upvotes: 6

Rich Scriven
Rich Scriven

Reputation: 99361

akrun borrowed my answer the other night, now I'm (partially) borrowing his.

do.call puts the variables into the call output so it reads properly. Here's a general function for simple regression.

doModel <- function(col1, col2, data = hsb2, FUNC = "lm") 
{
    form <- as.formula(paste(col1, "~", col2))
    do.call(FUNC, list(form, substitute(data)))
}     

lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept)        write  
#     18.1622       0.6455  
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept)         math  
#     14.0725       0.7248  
#
# ...
# ...
# ...

Note: As thelatemail mentions in his comment

sapply(varlist, doModel, col1 = "read", simplify = FALSE)

will keep the names in the list and also allow list$name subsetting.

Upvotes: 4

IRTFM
IRTFM

Reputation: 263451

This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:

models <- sapply(varlist, function(x) {
lm(read ~ .,  data = hsb2[, c("read", x) ])
}, simplify=FALSE)

> summary(models[[1]])  # The first model. Note the use of "[["

Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])

Residuals:
     Min       1Q   Median       3Q      Max 
-19.8565  -5.8976  -0.8565   5.5801  24.2703 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.16215    3.30716   5.492 1.21e-07 ***
write        0.64553    0.06168  10.465  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529 
F-statistic: 109.5 on 1 and 198 DF,  p-value: < 2.2e-16 

For all the models::

lapply(models, summary)

Upvotes: 5

Related Questions