Reputation: 333
I would like to use GLMNET to fit a binomial logistic regression model. I could use either caret or the glmnet-package directly. Lets take the data(BinomialExample) as an example to execute the following code where I´ve implemented both:
#rm(list = ls(all.names = TRUE))
library(glmnet)
library(caret)
data(BinomialExample)
y[y==0] = "low"
y[y==1] = "high"
y <- as.factor(y)
#split data in training & validation set
set.seed(1)
splitSample <- createDataPartition(y, p = 0.8, list = FALSE)
training_expression <- x[splitSample,]
training_phenotype <- y[splitSample]
validation_expression <- x[-splitSample,]
validation_phenotype <- y[-splitSample]
#####################
##GLMNET with CARET##
#####################
eGrid <- expand.grid(.alpha=seq(0.1,0.9, by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(verboseIter=TRUE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv")
set.seed(1)
netFit <- train(x = training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit)
trainROC <- netFitPerf[,1]
trainSens <- netFitPerf[,2]
trainSpec <- netFitPerf[,3]
trainAlpha <- netFit$bestTune[,1]
trainLambda <- netFit$bestTune[,2]
print(sprintf("ROC: %s Sens: %s Spec: %s Alpha: %s Lambda: %s", round(trainROC,2), round(trainSens,2), round(trainSpec,2), round(trainAlpha,2),round(trainLambda,2)))
predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)
######################
#GLMNET without CARET#
######################
set.seed(1)
elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", type.measure = "class", nfolds=10, alpha=0.5, nlambda = 100)
plot(elasticnet)
predict_validation <- predict(elasticnet, newx = validation_expression, s = c(elasticnet$lambda.min), type = "class")
confusionMatrix(predict_validation,validation_phenotype)
As you can see if I use the caret packet I can easily print the ROC, Sensitivity and Specificity of the model. However I was not able to find a similar way to print ROC, Sens, Spec if I use glmnet directly without CARET - is there a similar way to get these metrics?
Thanks for your help!
Upvotes: 3
Views: 6698
Reputation: 1
You can easily see the AUC and other measures like MSE using the assess.glmnet function directly.
Upvotes: 0
Reputation: 93761
You can get the values you want from various objects produced by your glmnet
workflow. For example, if you do
cm = confusionMatrix(predict_validation,validation_phenotype)
then cm$byClass
includes Specificity and Sensitivity:
cm$byClass
Sensitivity Specificity Pos Pred Value Neg Pred Value Prevalence
0.8181818 1.0000000 1.0000000 0.8000000 0.5789474
Detection Rate Detection Prevalence Balanced Accuracy
0.4736842 0.4736842 0.9090909
Likewise, you can get Lambda from elasticnet$lambda.min
and alpha
from gsub(".*alpha = ([0-9]\\.[0-9]*).*","\\1",deparse(elasticnet$glmnet.fit$call)[2])
(although there may be a better way than that monstrous piece of code). Actually, since the alpha
value is an input to the function, you don't even need to extract it. However, if you cross-validate on alpha
in addition to lambda
you'd need to use a loop to try out multiple alpha
values and then you'd need some way to extract the alpha
value of the best model. If you decide to include alpha
in the cross-validation, be sure to read the Details
section of cv.glmnet
.
For the AUC of the ROC curve, cv.glmnet
will give you that, but you'd need to use type.measure="auc"
instead of type.measure="class"
, which would change how the best model is selected. Also, with this particular data sample, you need to use fewer CV folds, but that might not be an issue with your real data. For example:
elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial",
type.measure = "auc", nfolds=5, alpha=0.5, nlambda = 100)
Then, to get the AUC:
elasticnet$cvm[which(elasticnet$lambda==elasticnet$lambda.min)]
or
max(elasticnet$cvm)
If you want to calculate the AUC without using AUC to select the best model, you might have to calculate that yourself or use a pre-existing function for this, such as auc
from the pROC
package.
Upvotes: 4