gregory
gregory

Reputation: 188

ELKI: How to Specify Feature Columns of CSV for K-Means

I am trying to run K-Means using ELKI MiniGUI. I have a CSV dataset of 15 features (columns) and a label column. I would like to do multiple runs of K-Means with different combinations of the feature columns.

Is there anywhere in the MiniGUI where I can specify the indeces of which columns I would like to be used for clustering?

If not, what is the simplest way to achieve this by changin/extending ELKI in Java?

Upvotes: 0

Views: 81

Answers (1)

Erich Schubert
Erich Schubert

Reputation: 8725

This is obivously easily achievable with Java code, or simply by preprocessing the data as necessary. Generate 10 variants, then launch ELKI via the command line.

But there is a filter to select columns: NumberVectorFeatureSelectionFilter. To only use columns 0,1,2 (in the numeric part; labels are treated separately at this point; this is a vector transformation):

-dbc.filter transform.NumberVectorFeatureSelectionFilter
-projectionfilter.selectedattributes 0,1,2

The filter could be extended using our newer IntRangeParameter to allow for specifications such as 1..3,5..8; but this has not been implemented yet.

Upvotes: 1

Related Questions