user3394131
user3394131

Reputation: 199

Weka OneR gives ? as classifier model

My Weka OneR models are all returning what seems like an overfit set, concluding with a question mark leading to a certain results like so:

FollowersMeanCoords_Col:
    < 0.33340000000000003   -> False
    >= 0.33340000000000003  -> True
    ?   -> False
(114357/163347 instances correct)

Is this OneR simply saying "I can't find anything, so we assume the rest is false"? But then, why is there a clear cut in the date (everything below 0.33 is False, above or equal is True)? And is there a way to prevent this?

Thanks in advance!

Upvotes: 1

Views: 451

Answers (1)

nekomatic
nekomatic

Reputation: 6284

The ? refers to missing values - your training data must have some values of FollowersMeanCoords_Col missing for some instances.

The model in your question says that if FollowersMeanCoords_Col for an instance (data point) is less than 0.3334..., or is missing, it will classify the instance as False, otherwise it will classify it as True.

OneR is a very simple classification algorithm which works by finding the one attribute from the training data that gives the least error when used to make a classification rule. For OneR to overfit there would need to be an attribute that happened to classify the training data well, but didn't generalise to future test data. It's more likely that OneR will give you models that are robust but inaccurate.

Upvotes: 1

Related Questions