user86533
user86533

Reputation: 333

glmnet & caret: ROC, Sensitivity, Specificity of training model

I would like to use GLMNET to fit a binomial logistic regression model. I could use either caret or the glmnet-package directly. Lets take the data(BinomialExample) as an example to execute the following code where I´ve implemented both:

#rm(list = ls(all.names = TRUE))

library(glmnet)    
library(caret)
data(BinomialExample)

y[y==0] = "low"
y[y==1] = "high"
y <- as.factor(y)

#split data in training & validation set

set.seed(1)
splitSample <- createDataPartition(y, p = 0.8, list = FALSE)
training_expression <- x[splitSample,]
training_phenotype <- y[splitSample]
validation_expression <- x[-splitSample,]
validation_phenotype <- y[-splitSample]

#####################
##GLMNET with CARET##
#####################
eGrid <- expand.grid(.alpha=seq(0.1,0.9, by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(verboseIter=TRUE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv") 

set.seed(1)
netFit <- train(x = training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit) 
trainROC <- netFitPerf[,1]
trainSens <- netFitPerf[,2]
trainSpec <- netFitPerf[,3] 
trainAlpha <- netFit$bestTune[,1]
trainLambda <- netFit$bestTune[,2]
print(sprintf("ROC: %s Sens: %s Spec: %s Alpha: %s Lambda: %s", round(trainROC,2), round(trainSens,2), round(trainSpec,2), round(trainAlpha,2),round(trainLambda,2))) 

predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)

######################
#GLMNET without CARET#
######################
set.seed(1)
elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", type.measure = "class", nfolds=10, alpha=0.5, nlambda = 100) 
plot(elasticnet)
predict_validation <- predict(elasticnet, newx = validation_expression, s = c(elasticnet$lambda.min), type = "class")
confusionMatrix(predict_validation,validation_phenotype)

As you can see if I use the caret packet I can easily print the ROC, Sensitivity and Specificity of the model. However I was not able to find a similar way to print ROC, Sens, Spec if I use glmnet directly without CARET - is there a similar way to get these metrics?

Thanks for your help!

Upvotes: 3

Views: 6698

Answers (2)

Paritra Mandal Jain
Paritra Mandal Jain

Reputation: 1

You can easily see the AUC and other measures like MSE using the assess.glmnet function directly.

Upvotes: 0

eipi10
eipi10

Reputation: 93761

You can get the values you want from various objects produced by your glmnet workflow. For example, if you do

cm = confusionMatrix(predict_validation,validation_phenotype)

then cm$byClass includes Specificity and Sensitivity:

cm$byClass
     Sensitivity          Specificity       Pos Pred Value       Neg Pred Value           Prevalence 
       0.8181818            1.0000000            1.0000000            0.8000000            0.5789474 
  Detection Rate Detection Prevalence    Balanced Accuracy 
       0.4736842            0.4736842            0.9090909 

Likewise, you can get Lambda from elasticnet$lambda.min and alpha from gsub(".*alpha = ([0-9]\\.[0-9]*).*","\\1",deparse(elasticnet$glmnet.fit$call)[2]) (although there may be a better way than that monstrous piece of code). Actually, since the alpha value is an input to the function, you don't even need to extract it. However, if you cross-validate on alpha in addition to lambda you'd need to use a loop to try out multiple alpha values and then you'd need some way to extract the alpha value of the best model. If you decide to include alpha in the cross-validation, be sure to read the Details section of cv.glmnet.

For the AUC of the ROC curve, cv.glmnet will give you that, but you'd need to use type.measure="auc" instead of type.measure="class", which would change how the best model is selected. Also, with this particular data sample, you need to use fewer CV folds, but that might not be an issue with your real data. For example:

elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", 
                        type.measure = "auc", nfolds=5, alpha=0.5, nlambda = 100) 

Then, to get the AUC:

elasticnet$cvm[which(elasticnet$lambda==elasticnet$lambda.min)] 

or

max(elasticnet$cvm)

If you want to calculate the AUC without using AUC to select the best model, you might have to calculate that yourself or use a pre-existing function for this, such as auc from the pROC package.

Upvotes: 4

Related Questions