Gravity Boy
Gravity Boy

Reputation: 7

How to use 'adaboost' distribution in 'gbm' to have a meaningful prediction?

So, here is what is going on - I have the Titanic dataset with the following 9 columns:

(i) Survived (0/1) [2 levels],

(ii) Pclass(1/2/3) [3 levels],

(iii) Sex(M/F) [2 levels],

(iv) Age (continuous variable),

(v) Fare (continuous variable),

(vi) Embarked(C/Q/S) [3 levels],

(vii) SibSp (continuous variable),

(viii) Parch (continuous variable), and

(ix) Titles (Mr/MsMrs/Master/X) [4 levels].

I am trying to predict Survived from the other eight using the gbm package in R and I use the following:

fit.gbm = gbm(Survived ~ Age + Fare + SibSp + Parch + Pclass + Titles + Sex + Embarked , data=train , distribution = "adaboost", n.trees=500 , interaction.depth=3 , shrinkage=0.005)

Then I use

predd.gbm = predict(fit.gbm , newdata=train , type="response" , n.trees=500)    

And I don't understand what I am getting because everything in predd.gbm looks like 0.99983 , 0.999974, etc. How do I make sense of what I am getting and how do I predict 0/1 from this weird predd.gbm-"probability" where every element is close to 1?

Upvotes: 0

Views: 286

Answers (1)

Gravity Boy
Gravity Boy

Reputation: 7

Aaah, so the way to do it is change (i) from factor to numeric using:

train$Survived = as.numeric(train$Survived)

randomForest understands that Survived is a factor but gbm doesn't!

Upvotes: 0

Related Questions