Ayush Raj Singh
Ayush Raj Singh

Reputation: 71

Random forest does not seem to handle more than 32 categories of factors. What do I do to include these factors in training my model?

I am trying to train Random forest on my training data which has predictors like 'names', 'city'. These two predictors have more than 32 categories. What do I do to include them?

Even some other algorithms does not seem to handle larger categories like SVM or gbm.

Upvotes: 2

Views: 2626

Answers (1)

Ari B. Friedman
Ari B. Friedman

Reputation: 72769

It is generally recommended to avoid using the formula interface to randomforest anyway for reasons of speed. Instead, use model.matrix with your formula, and feed the result of that to randomforest. Then you can have as many categories as you'd like, since they are dichotomized (i.e. dummied out or turned into binary variables).

As @joran pointed out, you might want to think about your problem more as well.

Upvotes: 2

Related Questions