Reputation: 33
I have a highly unbalanced and skewed train and test dataset with 15 features for an anomaly/failure detection problem. The training set has around 60,000 instances where 88 of them are "fail" and rest are "pass" events. The test set has around 35,000 test cases where only 46 are "fail" and rest are "pass". What is a good classifier and approach to detect "fail" events?
I have tried both oversizing (the "fail" instances) and undersizing (the "pass" instances) the training to reach a balanced dataset, but the overall classification accuracy on the test never goes beyond 60%. Please suggest a good classifier and any useful techniques that you may know.
Upvotes: 0
Views: 198
Reputation: 189
Since your dataset is highly skewed (about 1/1000), using anomaly detection techniques might help achieve higher accuracy.
Upvotes: 1