csprod
csprod

Reputation: 1

How to treat integer attributes in WEKA i.e. number of bedrooms (cannot be float values)

Using WEKA for classification problem on dataset in arff file format.

I want to use SMOTE on my dataset since I have a class imbalance; however, whenever I do this, it generates 'impossible' attribute values for some of these new synthetic instances. For example, an attribute 'number_of_bedrooms' cannot be a float value, yet after applying SMOTE, some of the values will be 3.5 etc.

I am wanting to apply some sort of filter in WEKA so that this specific attribute can only be whole intger numbers. Do I need to discretize this attribute? Would this be right for an attribute like number of rooms?

If I do discretize, should this be one bin per number of rooms from the data that is in the set i.e. one bin for each of 1,2,3,4, or 5 bedrooms? Or should it be binned taking into account the target class info which would be more like 1, 2-3, and then 4+ bedrooms to aid with classification?

I have tried applying the following filters: (NOTE: all settings are default unless specified below. I am using the GUI and not coding in the terminal, formatting here wants the lines as code/blockquote)

weka.filters.unsupervised.attribute.Discretize
binRangePrecision = 0
bins = 10 (this was the default but I don't know whether to change it)
findNumBins = True

weka.filters.unsupervised.attribute.NumericToNominal

weka.filters.supervised.attribute.Discretize
binRangePrecision = 0

Upvotes: 0

Views: 38

Answers (1)

fracpete
fracpete

Reputation: 2608

You could use the weka.filters.unsupervised.attribute.NumericToNominal filter to convert your number of bedrooms numeric attribute into a nominal one. This filter simply turns numbers into their string representation to be used as labels of a nominal attribute.

Upvotes: 0

Related Questions