Reputation: 4275
I want do write a small function which I can use for automatic feature selection in a logistic regression in R, by testing in a brute force manner all subsets of predictor variables and then evaluate via CV their classification performance.
Surprisingly I did not find a package which does this "all subset feature selection" and thus I would like to implement it myself.
Unfortunately my limited R knowledge makes me fail to write a loop which generates all subsets of a given vector and I was wondering if someone could point me in the right direction
Upvotes: 2
Views: 497
Reputation: 115390
Caveat incernor
The bestglm
package is what you are after
The function bestglm selects the best subset of inputs for the glm family. The selec- tion methods available include a variety of information criteria as well as cross-validation
The vignette goes through a number of examples.
library(bestglm)
data(SAHeart)
# using Cross valiation for selection
out<-bestglm(SAheart,IC = 'CV', family=binomial, t = 10)
out
# CVd(d = 373, REP = 10)
# BICq equivalent for q in (0.190525988534159, 0.901583162187443)
# Best Model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -6.44644451 0.92087165 -7.000372 2.552830e-12
# tobacco 0.08037533 0.02587968 3.105731 1.898095e-03
# ldl 0.16199164 0.05496893 2.946967 3.209074e-03
# famhistPresent 0.90817526 0.22575844 4.022774 5.751659e-05
# typea 0.03711521 0.01216676 3.050542 2.284290e-03
# age 0.05046038 0.01020606 4.944159 7.647325e-07
Upvotes: 5
Reputation: 5197
You can use paste()
+ combn()
, e.g.
varnames <- c("a","b","c")
rhs <- unlist( sapply(1:length(varnames),function(k) apply(combn(varnames,k),2,paste,collapse=" + ") ) )
formulae <- as.formula( quote( paste("z ~", rhs) ) )
... but perhaps there is a more elegant way?
Upvotes: 0
Reputation: 1069
Wouldn't drop1()
and add1()
be helpful for your purpose? They come with the usual caution that automatic feature selection may not always be the most appropriate thing to do, but I presume you have made an informed choice on this.
Upvotes: 0