Encipher
Encipher

Reputation: 2936

Accuracy degrade with highly skewed data after handling imbalance problem

After preprocessing my data, like missing value replacement and outlier detection I partitioned my data using randomized and remove percentage filter using WEKA. My dataset is a highly skewed dataset with imbalance ratio 6:1 corresponding to negative and positive class. If I classify the data using Naive Bayes classifier without handling class imbalance problem I got 83% accuracy with recall 0.623. However if I handle class imbalance(after balance 1:1) with supervised -instances - resample or supervised -instances - spreadsub sample filter and then apply Naive Bayes for classification accuracy degrade by 77% with recall ratio 0.456.

Why does my accuracy degrade when handling class imbalance ratio?

Upvotes: 0

Views: 122

Answers (1)

fracpete
fracpete

Reputation: 2608

If you have a class imbalance of 6:1 then the majority class is 6/7 = 85.7%. Just by predicting the majority class (eg using ZeroR) you would get an accuracy slightly better than what NaiveBayes achieves.

After balancing your dataset, NaiveBayes reports 77% accuracy, which is well above the 50% for predicting the majority class.

NaiveBayes has, in some sense, actually improved.

Upvotes: 2

Related Questions