Reputation: 2936
After preprocessing my data, like missing value replacement and outlier detection I partitioned my data using randomized and remove percentage filter using WEKA. My dataset is a highly skewed dataset with imbalance ratio 6:1 corresponding to negative and positive class. If I classify the data using Naive Bayes classifier without handling class imbalance problem I got 83% accuracy with recall 0.623. However if I handle class imbalance(after balance 1:1) with supervised -instances - resample or supervised -instances - spreadsub sample filter and then apply Naive Bayes for classification accuracy degrade by 77% with recall ratio 0.456.
Why does my accuracy degrade when handling class imbalance ratio?
Upvotes: 0
Views: 122
Reputation: 2608
If you have a class imbalance of 6:1 then the majority class is 6/7 = 85.7%. Just by predicting the majority class (eg using ZeroR) you would get an accuracy slightly better than what NaiveBayes achieves.
After balancing your dataset, NaiveBayes reports 77% accuracy, which is well above the 50% for predicting the majority class.
NaiveBayes has, in some sense, actually improved.
Upvotes: 2