Binny
Binny

Reputation: 118

Is the result produced after SMOTE reliable?

I have a skewed dataset having twitter tweet and Sentiments associated with it.The ratio of positive:negative sentiment is around 1:4(training set). When i ran the training set on Weka(without SOMTE), the results were unsatisfactory. So i used SMOTE to balance the classes.The results i got after that was far better. I used Libsvm for classification.

How reliable is the model generated by such SMOTE technique? Can we always use SMOTE for such imbalanced dataset? I am new to ML and weka, so dont know much about these things.

Upvotes: 1

Views: 1446

Answers (1)

Rushdi Shams
Rushdi Shams

Reputation: 2423

It depends. There are many pros and cons of oversampling and undersampling be it random or synthetic. The results should be checked by comparing the training and cross validation or test error. Also with the learning curves by plotting error rates at y axis and data size at x axis. This way overly optimistic results, generalization power, etc can be detected. Sometimes we can get good score just because of overfitting. I used SMOTE and have got good results. But then I had to check the processes I mentioned to see how good is that good. The other thing you might try for class imbalance problem is to either keep the dataset as it is and then apply a cost sensitive learner which will be punished for FP and FN based on some weights. Also you can apply a regular algorithm on the imbalanced dataset and then apply cost sensitive evaluation like cost curve. This curve can tell how your model would have performed had it been given a 50-50 balanced dataset.

Upvotes: 2

Related Questions