user695652
user695652

Reputation: 4275

Computing all subsets of a vector in R

I want do write a small function which I can use for automatic feature selection in a logistic regression in R, by testing in a brute force manner all subsets of predictor variables and then evaluate via CV their classification performance.

Surprisingly I did not find a package which does this "all subset feature selection" and thus I would like to implement it myself.

Unfortunately my limited R knowledge makes me fail to write a loop which generates all subsets of a given vector and I was wondering if someone could point me in the right direction

Upvotes: 2

Views: 497

Answers (3)

mnel
mnel

Reputation: 115390

Caveat incernor

The bestglm package is what you are after

The function bestglm selects the best subset of inputs for the glm family. The selec- tion methods available include a variety of information criteria as well as cross-validation

The vignette goes through a number of examples.

library(bestglm)
data(SAHeart)
# using Cross valiation for selection
out<-bestglm(SAheart,IC  = 'CV', family=binomial, t = 10)
out
# CVd(d = 373, REP = 10)
# BICq equivalent for q in (0.190525988534159, 0.901583162187443)
# Best Model:
#                   Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)    -6.44644451 0.92087165 -7.000372 2.552830e-12
# tobacco         0.08037533 0.02587968  3.105731 1.898095e-03
# ldl             0.16199164 0.05496893  2.946967 3.209074e-03
# famhistPresent  0.90817526 0.22575844  4.022774 5.751659e-05
# typea           0.03711521 0.01216676  3.050542 2.284290e-03
# age             0.05046038 0.01020606  4.944159 7.647325e-07

Upvotes: 5

petrelharp
petrelharp

Reputation: 5197

You can use paste() + combn(), e.g.

varnames <- c("a","b","c")
rhs <- unlist( sapply(1:length(varnames),function(k) apply(combn(varnames,k),2,paste,collapse=" + ") ) )
formulae <- as.formula( quote( paste("z ~", rhs) ) )

... but perhaps there is a more elegant way?

Upvotes: 0

msp
msp

Reputation: 1069

Wouldn't drop1() and add1() be helpful for your purpose? They come with the usual caution that automatic feature selection may not always be the most appropriate thing to do, but I presume you have made an informed choice on this.

Upvotes: 0

Related Questions