Reputation: 145
I already have computed clusters and want to use ELKI library only to perform evaluation on this clustering.
So I have data in this form:
0.234 0.923 cluster_1 true_cluster1
0.543 0.874 cluster_2 true_cluster3
...
I tried to:
Create 2 databases: with result labels and with reference labels:
double [][] data;
String [] reference_labels, result_labels;
DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels);
Database db1 = new StaticArrayDatabase(dbc1, null);
DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels);
Database db2 = new StaticArrayDatabase(dbc2, null);
Perform ByLabel Clustering for each database:
Clustering<Model> clustering1 = new ByLabelClustering().run(db1);
Clustering<Model> clustering2 = new ByLabelClustering().run(db2);
Use ClusterContingencyTable for comparing clusterings and getting measures:
ClusterContingencyTable ct = new ClusterContingencyTable(true, false);
ct.process(clustering1, clustering2);
PairCounting paircount = ct.getPaircount();
The problem is that measuers are not computed.
I looked into source code of ContingencyTable and PairCounting and it seems that it won't work if clusterings come from different databases and a database can have only 1 labels relation.
Is there a way to do this in ELKI?
Upvotes: 0
Views: 118
Reputation: 8725
You can modify the ByLabelClustering
class easily (or implement your own) to only use the first label, or only use the second label; then you can use only one database.
Or you use the 3-parameter constructor:
DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels, 0);
Database db1 = new StaticArrayDatabase(dbc1, null);
DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels, 0);
Database db2 = new StaticArrayDatabase(dbc2, null);
so that the DBIDs are the same. Then ClusterContingencyTable
should work.
By default, ELKI would continue enumerating objects, so the first database would have IDs 1..n, and the second n+1..2n. But in order to compare clusterings, they need to contain the same objects, not disjoint sets.
Upvotes: 1