What is the difference between omitted value and an unknown value in WEKA

Question

What is the difference between an unknown value and an omitted value for an attribute in WEKA? I learned that for a missing value, we put the ? mark as the value for the corresponding attribute, and 0 for an omitted value. What is the difference.

Suppose we were to plot the data in a n dimensional space, then how will the unknown values be represented along their axes, because they are not zero.

Thanks Abhishek S

Sicco · Accepted Answer

The unknown values are dealt with differently by each classifier. For example, some will assign the mean value of that feature to each unknown value. This way the unknown values can plotted.

Omitted values are only used in sparse ARFF files. These files are useful if your dataset is sparse (i.e. where most values are 0). Instead of writing all the 0's in the file you only have to write the non-zero values and their corresponding location. In this case all the values that are non represented are thus assumed to be 0.

Basically; If you don't know a value then you assign the unknown value ?.

What is the difference between omitted value and an unknown value in WEKA

Answers (1)

Related Questions