mohammadreza
mohammadreza

Reputation: 43

pass/fail school grades binary classification in weka

I'm using Weka for data mining. My data is considering school grades (numeric output between 0 and 20) . I want the grades to be modeled by Binary classification (i.e "pass" if grades>=10, else "fail"). But when I use discretization in Weka and do binning (by defining 2 bins), the values=10 go to the lower bin (fail group). I want values=10 to be part of the upper bin (pass group). How can I solve this problem?

Upvotes: 0

Views: 139

Answers (1)

zbicyclist
zbicyclist

Reputation: 696

The MathExpression filter will work.

Example arff file, with y and y2 just duplicated so I can transform y2, and x as just another attribute:

@relation so_2020-04-01

@attribute x numeric
@attribute y numeric
@attribute y2 numeric

@data
0.32789,12,12
0.932754,8,8
0.750824,20,20
0.601161,17,17
0.867985,2,2
0.469246,19,19
0.570984,10,10
0.82686,18,18
0.536315,6,6
0.878526,15,15
0.318298,7,7
0.278011,5,5
0.78302,4,4
0.557255,1,1
0.510926,3,3
0.429421,13,13
0.642457,9,9
0.227804,11,11
0.655531,16,16
0.41444,14,14

Set up the MathExpression:

enter image description here

After you Apply, y2 now has 1 for pass and 0 for fail, with 10 as the cut point.

@relation 'so_2020-04-01-weka.filters.unsupervised.attribute.MathExpression-Eifelse(A>10,1,0)-Rfirst,2-unset-class-temporarily'

@attribute x numeric
@attribute y numeric
@attribute y2 numeric

@data
0.32789,12,1
0.932754,8,0
0.750824,20,1
0.601161,17,1
0.867985,2,0
0.469246,19,1
0.570984,10,0
0.82686,18,1
0.536315,6,0
0.878526,15,1
0.318298,7,0
0.278011,5,0
0.78302,4,0
0.557255,1,0
0.510926,3,0
0.429421,13,1
0.642457,9,0
0.227804,11,1
0.655531,16,1
0.41444,14,1

You can then use the NumericToNominal filter if you want the class variable to be nominal rather than numeric.

Upvotes: 1

Related Questions