Reputation: 3029
I was curious to know if reducing the number of classes in a supervised classification model(in particular Logistic regression) for multi-class classification significantly helps in increasing the accuracy. For instance, if I have 50 classes for 10000 samples and I reduce the number of classes to 30 by combining certain classes together. Will this significantly boost accuracy of my classification model?
Upvotes: 1
Views: 3403
Reputation: 1691
It will definitely improve your performance if the classes you combine are similar and have a significant number of samples that are missclassified between them, because it will decrease the errors.
For example:
If the classes you group are not similar, it will most likely not improve your accuracy, since you will not reduce the number of errors. Imagine that your classifier is so good that you don't mistake any cat as a dog and viceversa, you won't reduce any errors when combining this classes because there aren't.
Upvotes: 9
Reputation: 19169
The effects of reducing the number of classes are dependent on both the algorithm and the data set. In general, there is no guarantee that reducing the number of classes will increase classification accuracy. In many cases, the opposite is true - increasing the number of classes can improve classification accuracy.
For example, for many data sets, you could make each observation correspond to a unique class and end up with 100% classification accuracy. This an obvious example of overfitting but it goes to the point that increasing (as opposed to decreasing) the number of classes can sometimes improve classification accuracy.
Upvotes: 3