msg
msg

Reputation: 21

Why are GBM cv.fitted values (after Bernoulli distribution is used) not between 0 and 1?

I am estimating a GBM model with 5-fold cross-validation. The outcome is binary (0, 1), the distribution used is Bernoulli. I would like to use the cross-validated predicted values. However, when I look at the CV.fitted values of the model, they are not between 0 and 1.

The 'gbm' package guide states the following for cv.fitted: "If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the i-th CV-fold, for the model having been trained on the data in all other folds."

My code is:

gbm.fit <- gbm(
    lie ~ ., data=datatrain,
    distribution="bernoulli",
    n.trees = 300,
    shrinkage = best_shrinkage,
    interaction.depth = best_depth,
    n.minobsinnode = best_obs,
    bag.fraction = best_subsample,
    cv.folds = 5,
    n.cores = NULL, # will use all cores by default
    verbose = TRUE
  )

The variable lie is 0 or 1.

Extracting gbm.fit$cv.fitted yields values: [1] 0.1565624979 0.1943624501 0.1137481303 0.1574121717 -0.5128581783 -0.0056283070 ...

Is there an option that can be specified such that the cv.fitted values will lie between 0 and 1? Why can they be negative and larger than 1?

Upvotes: 1

Views: 312

Answers (1)

msg
msg

Reputation: 21

I posted this question on github, gbm-developers, and Greg Ridgeway provided the following (very helpful) answer:

For the Bernoulli, the default fitted values are on the log odds scale, log(p/(1-p)). You can covert them to probabilities as 1/(1+exp(-predictedvalue)) or use the predict() function with type="response"

https://www.rdocumentation.org/packages/gbm/versions/2.1.8/topics/predict.gbm

Greg

Upvotes: 0

Related Questions