Reputation: 153
I'm using nominal independent variables such as 'gender', 'education_level', 'martial_status, and a nominal dependent variable - 'True_or_false'.
I have created an ARFF file with attributes labelled with their datatype. In the case of nominal attributes, I have also listed the meanings of their number assignments.
I would like to not only know the correct (Discrete or NumericToNominal) filters to use for such variables but also how these filters differ.
Upvotes: 0
Views: 172
Reputation: 2608
The Discretize filters (the supervised version also takes the class attribute into account) turn a continuous vatiable (e.g., distance_travelled) into bins, based on the distribution of values (check the synopsis of each filter for details).
Thr NumericToNominal filter is for situations where a categorical variable (e.g., mode_of_transport like car/bike/bus represented by 0/1/2) got interpeted incorrectly as a numeric one (a value of 1.5 makes no sense in such cases). This can happen during CSV imports or similar ones where there id no meta-data about the data type per column available. This filter simply turns the numbers it encounters into string labels.
Upvotes: 1