QPTR
QPTR

Reputation: 1690

Running k-medoids algorithm in ELKI

I am trying to run ELKI to implement k-medoids (for k=3) on a dataset in the form of an arff file (using the ARFFParser in ELKI):

enter image description here

The dataset is of 7 dimensions, however the clustering results that I obtain show clustering only on the level of one dimension, and does this only for 3 attributes, ignoring the rest. Like this:

enter image description here

Could anyone help with how can I obtain a clustering visualization for all dimensions?

Upvotes: 0

Views: 261

Answers (1)

Erich Schubert
Erich Schubert

Reputation: 8715

ELKI is mostly used with numerical data.

Currently, ELKI does not have a "mixed" data type, unfortunately.

The ARFF parser will split your data set into multiple relations:

  1. a 1-dimensional numerical relation containing age
  2. a LabelList relation storing sex and region
  3. a 1-dimensional numerical relation containing salary
  4. a LabelList relation storing married
  5. a 1-dimensional numerical relation storing children
  6. a LabelList relation storing car

Apparently it has messed up the relation labels, though. But other than that, this approach works perfectly well with arff data sets that consist of numerical data + a class label, for example - the use case this parser was written for. It is a well-defined and consistent behaviour, though not what you expected it to do.

The algorithm then ran on the first relation it could work with, i.e. age only.

So here is what you need to do:

  1. Implement an efficient data type for storing mixed type data.
  2. Modify the ARFF parser to produce a single relation of mixed type data.
  3. Implement a distance function for this type, because the lack of a mixed type data representation means we do not have a distance to go with it either.
  4. Choose this new distance function in k-Medoids.
  5. Share the code, so others do not have to do this again. ;-)

Alternatively, you could write a script to encode your data in a numerical data set, then it will work fine. But in my opinion, the results of one-hot-encoding etc. are not very convincing usually.

Upvotes: 1

Related Questions