Reputation: 21
I am estimating a GBM model with 5-fold cross-validation. The outcome is binary (0, 1), the distribution used is Bernoulli. I would like to use the cross-validated predicted values. However, when I look at the CV.fitted values of the model, they are not between 0 and 1.
The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."
My code is:
gbm.fit <- gbm(
lie ~ ., data=datatrain,
distribution="bernoulli",
n.trees = 300,
shrinkage = best_shrinkage,
interaction.depth = best_depth,
n.minobsinnode = best_obs,
bag.fraction = best_subsample,
cv.folds = 5,
n.cores = NULL, # will use all cores by default
verbose = TRUE
)
The variable lie is 0 or 1.
Extracting gbm.fit$cv.fitted yields values: [1] 0.1565624979 0.1943624501 0.1137481303 0.1574121717 -0.5128581783 -0.0056283070 ...
Is there an option that can be specified such that the cv.fitted values will lie between 0 and 1? Why can they be negative and larger than 1?
Upvotes: 1
Views: 312
Reputation: 21
I posted this question on github, gbm-developers, and Greg Ridgeway provided the following (very helpful) answer:
For the Bernoulli, the default fitted values are on the log odds scale, log(p/(1-p)). You can covert them to probabilities as 1/(1+exp(-predictedvalue)) or use the predict() function with type="response"
https://www.rdocumentation.org/packages/gbm/versions/2.1.8/topics/predict.gbm
Greg
Upvotes: 0