Evaluation of precomputed clustering using ELKI in Java

Question

I already have computed clusters and want to use ELKI library only to perform evaluation on this clustering.

So I have data in this form:

0.234 0.923 cluster_1 true_cluster1
0.543 0.874 cluster_2 true_cluster3
...

I tried to:

Create 2 databases: with result labels and with reference labels:

double [][] data;
String [] reference_labels, result_labels;

DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels);
Database db1 = new StaticArrayDatabase(dbc1, null);

DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels);
Database db2 = new StaticArrayDatabase(dbc2, null);

Perform ByLabel Clustering for each database:

Clustering clustering1 = new ByLabelClustering().run(db1);
Clustering clustering2 = new ByLabelClustering().run(db2);

Use ClusterContingencyTable for comparing clusterings and getting measures:

ClusterContingencyTable ct = new ClusterContingencyTable(true, false);
ct.process(clustering1, clustering2);
PairCounting paircount = ct.getPaircount();

The problem is that measuers are not computed.
I looked into source code of ContingencyTable and PairCounting and it seems that it won't work if clusterings come from different databases and a database can have only 1 labels relation.
Is there a way to do this in ELKI?

Erich Schubert · Accepted Answer

You can modify the ByLabelClustering class easily (or implement your own) to only use the first label, or only use the second label; then you can use only one database.

Or you use the 3-parameter constructor:

DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels, 0);
Database db1 = new StaticArrayDatabase(dbc1, null);

DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels, 0);
Database db2 = new StaticArrayDatabase(dbc2, null);

so that the DBIDs are the same. Then ClusterContingencyTable should work.

By default, ELKI would continue enumerating objects, so the first database would have IDs 1..n, and the second n+1..2n. But in order to compare clusterings, they need to contain the same objects, not disjoint sets.

Evaluation of precomputed clustering using ELKI in Java

Answers (1)

Related Questions