Reputation: 333
I have 200 patients which are allocated to a training and validation set with a 2:1 ratio. I use caret with GLMNET to train a classifier that allows to predict a binary phenotype:
splitSample <- createDataPartition(phenotype, p = 0.66, list = FALSE)
training_expression <- expression[splitSample,]
training_phenotype <- phenotype[splitSample]
validation_expression <- expression[-splitSample,]
validation_phenotype <- phenotype[-splitSample]
eGrid <- expand.grid(.alpha=seq(0,1,by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(number=10, repeats=1, verboseIter=FALSE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv")
netFit <- train(x =training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit)
predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)
"predict_validation" contains the predicted phenotype labels for each patient in the validation set - is there any valid method to also obtain "predicted" phenotype labels for each patient in the training set i.e. to finally have predicted phenotype labels for all patients available (which would be important to further perform statistical analysis e.g. to compare the predicted phenotype labels from all patients to other parameters (e.g. its correlation with age or survival etc.)? Any ideas?
Thank´s for your help!
Upvotes: 1
Views: 88
Reputation: 14316
It would be important to use the held out predictions from the training set; just re-predicting them would lead to overfit values.
If you use the option trainControl(savePredictions = "final")
, the train
object will have an element called pred
with the hold out predictions.
Max
Upvotes: 1