Reputation: 658
I run K-Means using:
KMeansDriver.run(new Path("./bd.seq.file"), new Path(clustersLoc), new Path("output"),
new EuclideanDistanceMeasure(), 0.001, 10, true, 0.5, false);
My aim is to know what cluster each of my original vectors belong to. From what I understand, this is supposed to be in output/clusteredPoints/part-m-00000, however this file looks like an empty (120 bytes) sequence file.
What gives?
Upvotes: 1
Views: 584
Reputation: 658
OK, I finally got it (at least partially). It has to do with KMeansDriver.run()
8th parameter.
If it has a value of '0' it behaves the same as in Mahout 0.5.
The parameter's name is 'clusterClassificationThreshold' and its javadoc states:
Is a clustering strictness / outlier removal parrameter. Its value should be between 0 and 1. Vectors having pdf below this value will not be clustered.
For any Mahout beginners like me, pdf is acronym for "Probability density function". I'm not sure I really got what this parameter is (googling did not help here, the javadocs are ALL you're gonna get), but I guess that because it is part of a mechanism that filters the original vectors Mahout developers chose to disable the clustering points in case that it is not '0'.
Upvotes: 1