bockdavidson
bockdavidson

Reputation: 2693

Weka: How to remove instances with missing values

I'm using the Weka application and using a CVS file, I need to remove the instances with missing values. I tried to use the multi filter and use the removevalues filter, but I think I am doing it wrong since it filters ALL my instances. How do I do this right exactly?

Upvotes: 0

Views: 3184

Answers (1)

nekomatic
nekomatic

Reputation: 6284

To remove instances with missing values from a few attributes you can use weka.filters.unsupervised.instance.SubsetByExpression and use an expression such as

not ismissing(ATT5)

to remove instances with missing values in the attribute with index 5, or

not (ismissing(ATT5) or ismissing(ATT8))

to remove instances with missing values in attributes 5 or 8, and so on.

If you were trying to use the RemoveWithValues filter, it can be done this way but you need to clear the nominalIndices field (removing the -L argument from the filter command) and set a splitPoint value more negative than the minimum value of the attribute being filtered. Otherwise this filter will match any instance whose value matches any of these conditions.

I can't see any obvious way of removing instances that have missing values in any attribute, other than building an expression for SubsetByExpression that checks all of them one by one.

Upvotes: 3

Related Questions