Diego
Diego

Reputation: 36126

R prediction within an interval

quick question on prediction.

The value I’m trying to predict is either 0 or 1 (it is set as numeric, not as a factor) so when I run my random forest:

fit <- randomForest(PredictValue ~ <variables>, data=trainData, ntree=50) 

and predict:

pred<-predict(fit, testData)

all my predictions are between 0 and 1 – which is what I expect and - I Imagine - can be interpreted as the probability of being 1.

Now, If I go through the same process using the gbm algorithm:

fitgbm <- gbm(PredictValue~ <variables>, data=trainData, distribution = "bernoulli", n.trees = 500,   bag.fraction = 0.75, cv.folds = 5, interaction.depth = 3)
predgbm <- predict(fitgbm, testData)

the values are from -0.5 to 0.5

I also tried glm and the range was worst, from around -3 to 3.

So, my question is: is it possible to set the algorithms to predict between 0 and 1?

Thanks

Upvotes: 1

Views: 148

Answers (1)

LyzandeR
LyzandeR

Reputation: 37879

You need to specify type='response' for this to happen:

Check this example:

y <- rep(c(0,1),c(100,100))
x <- runif(200)
df <- data.frame(y,x)


fitgbm <- gbm(y ~ x, data=df, 
              distribution = "bernoulli", n.trees = 100)

predgbm <- predict(fitgbm, df, n.trees=100, type='response')

Too simplistic but look at the summary of predgbm:

> summary(predgbm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4936  0.4943  0.5013  0.5000  0.5052  0.5073 

And as the documentation mentions this is the probability of y being 1:

If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson.

Upvotes: 1

Related Questions