Reputation: 1
I have a low amount of data instances in my data set. So, I tried the "resample" filter in Weka to increase the data amount and thus enhance the model performance. Is it okay to set the sample size percentage to 200? Because at that point I am getting a good correlation coefficient on the cross-validation test.
I want to know if the Resample filter works fine when setting the sample size percentage to 200. And after using this filter will my model predict accurately? Are there any other augmentation methods to enhance my model's performance because I have a low amount of data?
Upvotes: -1
Views: 71
Reputation: 2608
If you are using the Resample filter (supervised or unsupervised) as part of a FilteredClassifier meta-classifier setup, then it is safe to use.
If you are using it from the Preprocess panel, then you are generating duplicates in the overall dataset. When performing cross-validation on this augmented dataset, you will end up with some instances appearing in train and test splits. That could explain the improvements that you have seen.
An alternative to Resample is the SMOTE filter (separate package).
Upvotes: 0