user86533
user86533

Reputation: 333

Caret - predict phenotype labels for the training set?

I have 200 patients which are allocated to a training and validation set with a 2:1 ratio. I use caret with GLMNET to train a classifier that allows to predict a binary phenotype:

splitSample <- createDataPartition(phenotype, p = 0.66, list = FALSE)
training_expression <- expression[splitSample,]
training_phenotype <- phenotype[splitSample]
validation_expression <- expression[-splitSample,]
validation_phenotype <- phenotype[-splitSample]

eGrid <- expand.grid(.alpha=seq(0,1,by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(number=10, repeats=1, verboseIter=FALSE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv") 
netFit <- train(x =training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit) 

predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)

"predict_validation" contains the predicted phenotype labels for each patient in the validation set - is there any valid method to also obtain "predicted" phenotype labels for each patient in the training set i.e. to finally have predicted phenotype labels for all patients available (which would be important to further perform statistical analysis e.g. to compare the predicted phenotype labels from all patients to other parameters (e.g. its correlation with age or survival etc.)? Any ideas?

Thank´s for your help!

Upvotes: 1

Views: 88

Answers (1)

topepo
topepo

Reputation: 14316

It would be important to use the held out predictions from the training set; just re-predicting them would lead to overfit values.

If you use the option trainControl(savePredictions = "final"), the train object will have an element called pred with the hold out predictions.

Max

Upvotes: 1

Related Questions