John Smith
John Smith

Reputation: 2886

R: Feature Selection with Cross Validation using Caret on Logistic Regression

I am currently learning how to implement logistical Regression in R

I have taken a data set and split it into a training and test set and wish to implement forward selection, backward selection and best subset selection using cross validation to select the best features. I am using caret to implement cross-validation on the training data set and then testing the predictions on the test data.

I have seen the rfe control in caret and had also had a look at the documentation on the caret website as well as following the links on the question How to use wrapper feature selection with algorithms in R?. It isn't apparent to me how to change the type of feature selection as it seems to default to backward selection. Can anyone help me with my workflow. Below is a reproducible example

library("caret")

# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit

# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8, 
                              list = FALSE, 
                              times = 1)

train <- mydf[ trainIndex,]
test  <- mydf[-trainIndex,]


ctrl <- trainControl(method = "repeatedcv", 
                 number = 10, 
                 savePredictions = TRUE)

mod_fit <- train(Class~., data=train, 
             method="glm", 
             family="binomial",
             trControl = ctrl, 
             tuneLength = 5)


# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)

# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)

Upvotes: 3

Views: 4175

Answers (1)

docindata
docindata

Reputation: 48

You can simply call it in mod_fit. When it comes to backward stepwise the code below is sufficient

trControl <- trainControl(method="cv",
                          number = 5,
                          savePredictions = T,
                          classProbs = T,
                          summaryFunction = twoClassSummary)

caret_model <- train(Class~.,
                     train,
                     method="glmStepAIC", # This method fits best model stepwise.
                     family="binomial",
                     direction="backward", # Direction
                     trControl=trControl)

Note that in trControl

method= "cv", # No need to call repeated here, the number defined afterward defines the k-fold.
classProbs = T,
summaryFunction = twoClassSummary # Gives back ROC, sensitivity and specifity of the chosen model.

Upvotes: 1

Related Questions