Reputation: 1
Currently, I am trying to write a code to understand the individual prediction value of the variables I have used in my prediction model. I have made three GLM models (three different outcomes, binary Yes/No) and one GBM model (binary outcome Yes/No) with the caret package. The code below just shows one of the GLM model, but is basically similar to the others (only different y, and for the last model a GBM)
I would like to obtain the individual shapley values for all four models but I keep getting errors and I have no idea how to figure this out. Can someone please help me.
My code:
##1. Splitting data into two or three groups: training set/test set
#Setting seed for reproducibility
set.seed(123)
#createDataPartition does a stratified random split of the data
inTrain_general <- createDataPartition(y=BA_satisfaction$satisfaction.changescore.dissatisfied, p = 0.8, list = FALSE) #y = the outcome data, p = the percentage of data in the training set
#To partition the data, identiek voor alle 3 algorithmes
training_GLM <- BA_satisfaction[inTrain_general,]
testing_GLM <- BA_satisfaction[-inTrain_general,]
#Setting seed for reproducibility
set.seed(123)
# Specify the type of training method use & # of folds
ctrlspecs <- trainControl(method="cv", number=5,
savePredictions="all",
classProbs = TRUE)
#Set random seed for reproducibility
set.seed(123)
# Centering, scaling and imputation
#Specify logistic regression model (in one step)
#training_GLM <- training_GLM %>%
# droplevels()
# names(BA_satisfaction)
model1_GLM_train <- train(satisfaction.changescore.dissatisfied ~ age + BMI + cosm_history + smoking + satisfaction + psychosocial + physical + sexual + plane + shape + maxweightvol,
data = as.data.frame(training_GLM),
method="glm", family=binomial,
trControl=ctrlspecs,
na.action = na.pass,
preProcess = c("center", "scale", "nzv", "knnImpute"))
I have a test set, and also another hold-out set.
# Predict outcome using model from training_GLM applied to the testing_GLM
testing_GLM$probability <- predict(model1_GLM_train, newdata=testing_GLM, type = "prob", na.action = na.pass)[,2]
# Predict outcome using model from training_GLM applied to the BA_holdoutset_satisfaction
BA_holdoutset_satisfaction$probability <- predict(model1_GLM_train, newdata=BA_holdoutset_satisfaction, type = "prob", na.action = na.pass)[,2]
#Example of the GBM model
model3_GBM_train <- train(sexual.changescore.dissatisfied ~ leeftijd + BMI + cosm_vg + roken + satisfaction + psychosocial + physical + sexual + plane + shape + maxweightvol,
data = as.data.frame(training_GBM),
method="gbm",
trControl=ctrlspecs,
preProcess = c("center", "scale", "nzv", "knnImpute"),
na.action = na.pass,
metric = "Accuracy")
Now I tried to compute Shapley values with kernelSHAP but I get various error message. I have no clue how I can solve this. Can someone help me with the Shapley values for GLM and GBM models in caret?
The following link is what I would like to get as the output: How to get SHAP values for caret models in R?
See above. I tried using the packages DALEX, iml, shap etc. but none of them seem to work with a GLM or GBM?
Upvotes: 0
Views: 262