Why does WEKA-TestSets has to have a class attribute?

Question

I have very well defined training set for machine learning (only string attributes).

e.g.

@relation training_rel

@attribute class {politics,sports}
@attribute text string

@data
politics,'some text about politics over here'
... // a lot of other training instances of class politics
sports,'and now some sports over here'
... // a lot of other training instances of class sports

Ok this is my training set, of course just an example ... Now I would like to build a classifier (NaiveBayes). This works completely fine. I know that most of the classifiers cannot handle text, so I have to filter my data. I use a StringToWordVector for this.

All of the examples over the web that I have found define there test-instances also with class value (http://www.cs.ubc.ca/labs/beta/Projects/autoweka/datasets/) But why? I mean I don't know whether my text belongs to politics or sports, this why I use the classifier to get to know this ... Do I understand something wrong?

Why does WEKA-TestSets has to have a class attribute?

Answers (1)

Related Questions