Reputation: 7
So, here is what is going on - I have the Titanic dataset with the following 9 columns:
(i) Survived (0/1) [2 levels],
(ii) Pclass(1/2/3) [3 levels],
(iii) Sex(M/F) [2 levels],
(iv) Age (continuous variable),
(v) Fare (continuous variable),
(vi) Embarked(C/Q/S) [3 levels],
(vii) SibSp (continuous variable),
(viii) Parch (continuous variable), and
(ix) Titles (Mr/MsMrs/Master/X) [4 levels].
I am trying to predict Survived
from the other eight using the gbm
package in R
and I use the following:
fit.gbm = gbm(Survived ~ Age + Fare + SibSp + Parch + Pclass + Titles + Sex + Embarked , data=train , distribution = "adaboost", n.trees=500 , interaction.depth=3 , shrinkage=0.005)
Then I use
predd.gbm = predict(fit.gbm , newdata=train , type="response" , n.trees=500)
And I don't understand what I am getting because everything in predd.gbm looks like 0.99983 , 0.999974, etc. How do I make sense of what I am getting and how do I predict 0/1 from this weird predd.gbm-"probability" where every element is close to 1?
Upvotes: 0
Views: 286
Reputation: 7
Aaah, so the way to do it is change (i) from factor to numeric using:
train$Survived = as.numeric(train$Survived)
randomForest
understands that Survived is a factor but gbm
doesn't!
Upvotes: 0