Reputation: 9
according to paper which written by chawla, et al (2002) the best perfomance of balancing data is combining undersampling with SMOTE.
I’ve tried to combine my dataset using under-sampling and SMOTE, but I am bit confuse about the attribute for under-sampling.
In weka there is Resample to decrease the majority class. there is a attribute in Resample biasToUniformClass -- Whether to use bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distribution is uniform in the output data.
I use value 0 and the data in majority class is down so the minority do and when I use value 1, the data in majority decrease but in minority class, the data is up.
I try to use value 1 for that attribute, but I don't using smote to increase the instances of minority class because the data is already balance and the result is good too.
so, is that the same as I combine the SMOTE and under-sampling or I still have to try with value 0 in that attribute and do the SMOTE ?
Upvotes: 2
Views: 2110
Reputation: 116
For undersampling, see the EasyEnsemble algorithm (a Weka implementation was developed by Schubach, Robinson, and Valentini).
The EasyEnsemble algorithm allows you to split the data into a certain number of balanced partitions. To achieve this balance, set the numIterations parameter equal to:
(# of majority instances) / (# minority instances) = numIterations
For example, if there are 30 total instances with 20 in the majority class and 10 in the minority class, set the numIterations parameter equal to 2 (i.e., 20 majority instances / 10 instances equals 2 balanced partitions). These 2 partitions should each contain 20 instances; each has the same 10 minority instances along with a different 10 instances from the majority class.
The algorithm then trains classifiers on each of the balanced partitions, and at test time, ensembles the batch of classifiers trained on each of the balanced partitions for prediction.
Upvotes: 2