Samuel Paz
Samuel Paz

Reputation: 243

ARFF without one class in instances

So, I've been using the example "TextCategorizationTest.java" from this tutorial https://weka.wikispaces.com/Text+categorization+with+WEKA.

I've one directory with two folders: "neg" and "pos". These two folders represents the classes that should be in my ARRF. The problem is that when try to create the ARFF file, the instances doesn't contains the attribute class for "pos", but they do contain for the attribute class "neg".

Here it is my ARFF file: http://pastebin.com/6nGWEyMq

As you can see, "pos" instances are presented on this format: @data {1 1,3 1,24 1,27 1,29 1,37 ...} "neg" instances are presentd on this format: {0 neg,1 1,2 1,3 1,6 1 ...}

What can I do to fix this ARFF? I would accept solutions from both weka code or weka GUI.

Upvotes: 1

Views: 287

Answers (1)

Sentry
Sentry

Reputation: 4113

Your ARFF file is totally fine, there is no need to change it.

Your ARFF file is in the sparse format, which means that attributes with value 0 will be omitted. For scenarios where you expect a lot of attributes to be 0, e.g. word count, this format is much more compact.

The format is:

{index value,index value,index value, ...}

But as I said, attributes with value 0 will be omitted, so only the indices for attributes that are not 0 are listed here.

Nominal attributes are stored with using their value index (not to be confused with the attribute index) and the class attribute definition (first attribute with index 0) has them in the order {pos,neg}, so "pos" has value index 0 and "neg" has value index "1". That it way all "pos" entries are missing, because "pos" (with index 0) is the default.

The first columns of the some lines of your data are in the sparse format (as you posted it):

@data
{1 1,3 1,24 1,27 1, ...}
{1 1,4 1,5 1,8 1,17 1,24 1,26 1,29 1, ...}
...
{0 neg,17 1, ...}

This is equivalent to the following in the dense format:

@data
{0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0, ...}
{0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,1, ...}
...
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0, ...}

Upvotes: 1

Related Questions