JJ.Y
JJ.Y

Reputation: 325

Machine Learning -- how to improve the classification of certain classes

I am using Random Forest to do a classification problem. The response has 5 classes. All classes are equally distributed in the training set, however in the test dataset two certain classes make up the vast majority. What makes it challenging is that in the validation set, I can see that those two classes also have the worst accuracy rates. So my question is, are there ways to improve the classification accuracy of these two specific classes so as to improve my overall prediction?

Any input will be much appreciated!

Upvotes: 0

Views: 355

Answers (1)

Tomer Levinboim
Tomer Levinboim

Reputation: 1012

One simple way is to change the objective function to incur more/less loss upon misclassification of certain classes. For example, suppose predictions are marked with Y and ground-truth with T (both vectors), then the usual loss function is simply:

total_loss(Y,T) == \sum_n loss(y_n, t_n)

Above, the penalty for misclassification is the same for all classes, which can be modified to:

total_loss(Y,T) == \sum_n C(t_n) * loss(y_n, t_n)

Where C(t_n) denotes a weight assigned to the class t_n.

You can then tune C to maximize performace on the dev set, and hope to see improvement on the test set (assuming the label distribution of the dev set is similar to that of the test set).

If this sounds like the right way, you might want read a bit on decision theory (Section 1.5 in the PRML book by Bishop) and cost-sensitive learning (here and here).

Upvotes: 0

Related Questions