Reputation: 73
I am perplexed by the different results I obtained when I ran code like this:
set.seed(100)
test1<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200)
predict(test1, newdata=cbind(NewBinaryY, NewXs), type="response")
and this code:
set.seed(100)
test2<-randomForest(BinaryY~.,data=Xvars,trees=51, mtry=5,seed=200,xtest=NewXs, ytest=NewBinY)
The confusion matrices for the two forests I thought would be the same by virtue of the same seed settings, but they differ as do the predicted values as well as the votes. At first I thought it was just the way ties were broken, so I changed the number of trees to an odd number so there are no ties anymore.
Can anyone shed light on what I am hoping is a simple oversight? I just can't figure out why the results of the predictions from these two forests applied to the NewBinaryYs and NewX data sets would not be the same.
Also, I noticed that the results are the same when I am only using 1 tree.
Thanks for any hints and help.
Upvotes: 4
Views: 1339
Reputation: 15793
I believe xtest and ytest specify the test set for the random forest run itself, so that it uses that instead of randomly selected OOB samples. If this is the case then your two runs are using different test datasets, creating different results.
Upvotes: 1