Reputation: 1
Using WEKA for classification problem on dataset in arff file format.
I want to use SMOTE on my dataset since I have a class imbalance; however, whenever I do this, it generates 'impossible' attribute values for some of these new synthetic instances. For example, an attribute 'number_of_bedrooms' cannot be a float value, yet after applying SMOTE, some of the values will be 3.5 etc.
I am wanting to apply some sort of filter in WEKA so that this specific attribute can only be whole intger numbers. Do I need to discretize this attribute? Would this be right for an attribute like number of rooms?
If I do discretize, should this be one bin per number of rooms from the data that is in the set i.e. one bin for each of 1,2,3,4, or 5 bedrooms? Or should it be binned taking into account the target class info which would be more like 1, 2-3, and then 4+ bedrooms to aid with classification?
I have tried applying the following filters: (NOTE: all settings are default unless specified below. I am using the GUI and not coding in the terminal, formatting here wants the lines as code/blockquote)
weka.filters.unsupervised.attribute.Discretize
binRangePrecision = 0
bins = 10 (this was the default but I don't know whether to change it)
findNumBins = Trueweka.filters.unsupervised.attribute.NumericToNominal
weka.filters.supervised.attribute.Discretize
binRangePrecision = 0
Upvotes: 0
Views: 38
Reputation: 2608
You could use the weka.filters.unsupervised.attribute.NumericToNominal filter to convert your number of bedrooms numeric attribute into a nominal one. This filter simply turns numbers into their string representation to be used as labels of a nominal attribute.
Upvotes: 0