Niek Tax
Niek Tax

Reputation: 841

Weka java library: how to get string representation of classified instance?

Currently I'm working on a project of classifying search queries into the following eight types: {athlete, actor, artist, politician, geo, facility, QA, definition}. After a bit of work I managed to score 78% correctly classified instances for my set of 300 sample queries using a Multilayer Perceptron classifier when I evaluate the classifier with a stratified 10-fold cross validation, which is reasonably good I think.

Using the weka java library I implemented the whole thing into java code, so I can write a program that dynamically feeds a query to the classifier and retrieves it's query type. I managed to implement the whole classifier training part successfully. The next step would be to use either the classifyInstance() or distributionForInstance() to determine the class to which the query is classified.

classifyInstance() however does only return a double value for which I do not know to get the actual query-type out of it. The weka wikispaces tell me I can use

unlabeled.classAttribute().value((int) clsLabel);

After calling classifyInstance() to get the String representation of the class, this however seems to always return the empty string in my case.

Using distributionForInstance() I'm able to successfully retrieve an array with eight double values between 0 and 1 (which is good, as I classify into eight query types). However, what is the order of this array? Is the first element in the result array the first class that occurs in my training file? Or is there some other predefined element order in this result array (e.g. alphabetically)? The weka documentation does not give any information on this.

I hope someone will be able to help me out!

Upvotes: 2

Views: 2580

Answers (1)

kaz
kaz

Reputation: 685

Internally, Weka handles all values as doubles. When you create the Attribute, you pass it an array of strings that lists the possible nominal values. The double that classification returns is the index of the chosen attribute in the original array. So if you had code that looked like this:

String[] attributeValues = {"a", "b", "c"};
Attribute a = new Attribute("attributeName", attributeValues);

and classifyInstance() returned 2, then the class it chose would be attributeValues[2] or c.

With the distributionForInstance() method, the indexes of the two arrays match, so attributeValues[0] is the string name for the first element of the array returned.

UPDATE (because of downvote) The above method won't work if you're letting weka create the Instances object itself (e.g. if you're reading from an arff file). That doesn't seem to be the case given your question, but if it is, then please post code so we can see what's going on.

Upvotes: 2

Related Questions