Reputation: 71
I am trying to train Random forest on my training data which has predictors like 'names', 'city'. These two predictors have more than 32 categories. What do I do to include them?
Even some other algorithms does not seem to handle larger categories like SVM or gbm.
Upvotes: 2
Views: 2626
Reputation: 72769
It is generally recommended to avoid using the formula interface to randomforest anyway for reasons of speed. Instead, use model.matrix
with your formula, and feed the result of that to randomforest. Then you can have as many categories as you'd like, since they are dichotomized (i.e. dummied out or turned into binary variables).
As @joran pointed out, you might want to think about your problem more as well.
Upvotes: 2