R - caret::train "random forest" parameters

Question

I'm trying to build a classification model on 60 variables and ~20,000 observations using the train() fx within the caret package. I'm using the random forest method and am returning 0.999 Accuracy on my training set, however when I use the model to predict, it classifies each test observation as the same class (i.e. each of the 20 observations are classified as "1's" out of 5 possible outcomes). I'm certain this is wrong (the test set is for a Coursera quiz, hence my not posting exact code) but I'm not sure what is happening.

My question is that when I call the final model of fit (fit$finalModel), it says it made 500 total trees (default and expected), however the number of variables tried at each split is 35. I know that will classification, the standard number of observations chosen for each split is the square root of the number of total predictors (therefore, should be sqrt(60) = 7.7, call it 8). Could this be the problem??

I'm confused on whether there is something wrong with my model or my data cleaning, etc.

set.seed(10000)
fitControl <- trainControl(method = "cv", number = 5)
fit <- train(y ~ ., data = training, method = "rf", trControl = fitControl)

fit$finalModel

Call:
 randomForest(x = x, y = y, mtry = param$mtry) 
           Type of random forest: classification
                 Number of trees: 500
No. of variables tried at each split: 41

    OOB estimate of  error rate: 0.01%

R - caret::train "random forest" parameters

Answers (1)

Related Questions

R - caret::train &quot;random forest&quot; parameters

Answers (1)

Related Questions

R - caret::train "random forest" parameters