Kim Phi Luong
Kim Phi Luong

Reputation: 1

Shapley value for GLM and GBM prediction model made with caret package

Currently, I am trying to write a code to understand the individual prediction value of the variables I have used in my prediction model. I have made three GLM models (three different outcomes, binary Yes/No) and one GBM model (binary outcome Yes/No) with the caret package. The code below just shows one of the GLM model, but is basically similar to the others (only different y, and for the last model a GBM)

I would like to obtain the individual shapley values for all four models but I keep getting errors and I have no idea how to figure this out. Can someone please help me.

My code:

##1. Splitting data into two or three groups: training set/test set
#Setting seed for reproducibility
set.seed(123)
#createDataPartition does a stratified random split of the data
inTrain_general <- createDataPartition(y=BA_satisfaction$satisfaction.changescore.dissatisfied, p = 0.8, list = FALSE) #y = the outcome data, p = the percentage of data in the training set

#To partition the data, identiek voor alle 3 algorithmes
training_GLM <- BA_satisfaction[inTrain_general,]
testing_GLM <- BA_satisfaction[-inTrain_general,]

#Setting seed for reproducibility
set.seed(123)
# Specify the type of training method use & # of folds
ctrlspecs <- trainControl(method="cv", number=5,
                      savePredictions="all",
                      classProbs = TRUE)

#Set random seed for reproducibility
set.seed(123)
# Centering, scaling and imputation

#Specify logistic regression model (in one step) 
#training_GLM <- training_GLM %>% 
#  droplevels()
# names(BA_satisfaction)
model1_GLM_train <- train(satisfaction.changescore.dissatisfied ~ age + BMI + cosm_history +     smoking + satisfaction + psychosocial + physical + sexual + plane + shape + maxweightvol,
                      data = as.data.frame(training_GLM),
                      method="glm", family=binomial,
                      trControl=ctrlspecs,
                      na.action = na.pass,
                      preProcess = c("center", "scale", "nzv", "knnImpute"))

I have a test set, and also another hold-out set.
# Predict outcome using model from training_GLM applied to the testing_GLM
testing_GLM$probability <- predict(model1_GLM_train, newdata=testing_GLM, type = "prob", na.action = na.pass)[,2]

# Predict outcome using model from training_GLM applied to the BA_holdoutset_satisfaction
BA_holdoutset_satisfaction$probability <- predict(model1_GLM_train,     newdata=BA_holdoutset_satisfaction, type = "prob", na.action = na.pass)[,2]

#Example of the GBM model
model3_GBM_train <- train(sexual.changescore.dissatisfied ~ leeftijd + BMI + cosm_vg + roken + satisfaction + psychosocial + physical + sexual + plane + shape + maxweightvol,
                          data = as.data.frame(training_GBM),
                          method="gbm",
                          trControl=ctrlspecs,
                          preProcess = c("center", "scale", "nzv", "knnImpute"),
                          na.action = na.pass,
                          metric = "Accuracy")

Now I tried to compute Shapley values with kernelSHAP but I get various error message. I have no clue how I can solve this. Can someone help me with the Shapley values for GLM and GBM models in caret?

The following link is what I would like to get as the output: How to get SHAP values for caret models in R?

See above. I tried using the packages DALEX, iml, shap etc. but none of them seem to work with a GLM or GBM?

Upvotes: 0

Views: 262

Answers (0)

Related Questions