Reputation: 36126
quick question on prediction.
The value I’m trying to predict is either 0 or 1 (it is set as numeric, not as a factor) so when I run my random forest:
fit <- randomForest(PredictValue ~ <variables>, data=trainData, ntree=50)
and predict:
pred<-predict(fit, testData)
all my predictions are between 0 and 1 – which is what I expect and - I Imagine - can be interpreted as the probability of being 1.
Now, If I go through the same process using the gbm algorithm:
fitgbm <- gbm(PredictValue~ <variables>, data=trainData, distribution = "bernoulli", n.trees = 500, bag.fraction = 0.75, cv.folds = 5, interaction.depth = 3)
predgbm <- predict(fitgbm, testData)
the values are from -0.5 to 0.5
I also tried glm and the range was worst, from around -3 to 3.
So, my question is: is it possible to set the algorithms to predict between 0 and 1?
Thanks
Upvotes: 1
Views: 148
Reputation: 37879
You need to specify type='response'
for this to happen:
Check this example:
y <- rep(c(0,1),c(100,100))
x <- runif(200)
df <- data.frame(y,x)
fitgbm <- gbm(y ~ x, data=df,
distribution = "bernoulli", n.trees = 100)
predgbm <- predict(fitgbm, df, n.trees=100, type='response')
Too simplistic but look at the summary of predgbm
:
> summary(predgbm)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4936 0.4943 0.5013 0.5000 0.5052 0.5073
And as the documentation mentions this is the probability of y being 1:
If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson.
Upvotes: 1