Reputation: 43
I'm using Weka for data mining. My data is considering school grades (numeric output between 0 and 20) . I want the grades to be modeled by Binary classification (i.e "pass" if grades>=10, else "fail"). But when I use discretization in Weka and do binning (by defining 2 bins), the values=10 go to the lower bin (fail group). I want values=10 to be part of the upper bin (pass group). How can I solve this problem?
Upvotes: 0
Views: 139
Reputation: 696
The MathExpression filter will work.
Example arff file, with y and y2 just duplicated so I can transform y2, and x as just another attribute:
@relation so_2020-04-01
@attribute x numeric
@attribute y numeric
@attribute y2 numeric
@data
0.32789,12,12
0.932754,8,8
0.750824,20,20
0.601161,17,17
0.867985,2,2
0.469246,19,19
0.570984,10,10
0.82686,18,18
0.536315,6,6
0.878526,15,15
0.318298,7,7
0.278011,5,5
0.78302,4,4
0.557255,1,1
0.510926,3,3
0.429421,13,13
0.642457,9,9
0.227804,11,11
0.655531,16,16
0.41444,14,14
Set up the MathExpression:
After you Apply, y2 now has 1 for pass and 0 for fail, with 10 as the cut point.
@relation 'so_2020-04-01-weka.filters.unsupervised.attribute.MathExpression-Eifelse(A>10,1,0)-Rfirst,2-unset-class-temporarily'
@attribute x numeric
@attribute y numeric
@attribute y2 numeric
@data
0.32789,12,1
0.932754,8,0
0.750824,20,1
0.601161,17,1
0.867985,2,0
0.469246,19,1
0.570984,10,0
0.82686,18,1
0.536315,6,0
0.878526,15,1
0.318298,7,0
0.278011,5,0
0.78302,4,0
0.557255,1,0
0.510926,3,0
0.429421,13,1
0.642457,9,0
0.227804,11,1
0.655531,16,1
0.41444,14,1
You can then use the NumericToNominal filter if you want the class variable to be nominal rather than numeric.
Upvotes: 1