Reputation: 3158
I have a large set of model specifications to test, which share a dv but have unique IVs. In the following example
foo <- data.frame(dv = sample(c(0,1), 100, replace=T),
x1 = runif(100),
x2 = runif(100))
I want the first model to only include x1, the second x2, the third both, and the fourth their interaction. So I thought a sensible way would be to build a list of formula statements:
bar <- list("x1",
"x2",
"x1+x2",
"x1*x2")
which I would then use in a llply
call from the plyr
package to obtain a list of model objects.
require(plyr)
res <- llply(bar, function(i) glm(dv ~ i, data = foo, family = binomial()))
Unfortunately I'm told
Error in model.frame.default(formula = dv ~ i, data = foo, drop.unused.levels = TRUE):variable lengths differ (found for 'i')
Obviously I'm mixing up something fundamental--do I need to manipulate the original foo list in some fashion?
Upvotes: 0
Views: 870
Reputation: 4659
Your problem is with how you are specifying the formula, since inside the function i is a variable. This would work:
glm(paste("dv ~", i), data = foo, family = binomial())
Upvotes: 3
Reputation: 173537
The problem is that dv ~ i
isn't a formula. i
is (inside the anonymous function) simply a symbol that represents a variable containing a character value.
Try this:
bar <- list("dv~x1",
"dv~x2",
"dv~x1+x2",
"dv~x1*x2")
res <- llply(bar, function(i) glm(i, data = foo, family = binomial()))
But setting statistical issues aside, it might possibly be easier to use something like ?step
or ?stepAIC
in the MASS package for tasks similar to this?
Upvotes: 2