Reputation: 580
When I'm running random forest model over my test data I'm getting different results for the same data set + model.
Here are the results where you can see the difference over the first column:
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 14 7
TRUE 13 66
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 15 7
TRUE 12 66
Although the difference is very small, I'm trying to understand what caused that. I'm guessing that predict
has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right?
Thank you in advance
Upvotes: 5
Views: 1414
Reputation: 1544
I will assume that you did not refit the model here, but it is simply the predict
call that is producing these results. The answer is probably this, from ?predict.randomForest
:
Any ties are broken at random, so if this is undesirable, avoid it by using odd number ntree in randomForest()
Upvotes: 7