Reputation: 325
Is there a way of passing custom distance functions (e.g. jaccard distance) in MATLAB k-means implementation?
jaccard distance function
D = pdist(X,'jaccard');
Upvotes: 3
Views: 3441
Reputation: 355
What you need to do is break down your distance matrix into a feature space using SVD, then perform kmeans on the new feature space represented by the scores of the SVD. See Elements of Statistical Learning by Rob Tibshirani.
Or you can do k mediods which works with a distance matrix - as.dist() in R will convert a matrix to a dist object that you can then do K-mediods on.
Upvotes: 2
Reputation: 38032
From the documentation, we learn that we can pass a 'distance'
option to kmeans
:
'distance'
Distance measure, in p-dimensional space. kmeans minimizes with respect to this parameter. kmeans computes centroid clusters differently for the different supported distance measures.
'sqEuclidean'
Squared Euclidean distance (default). Each centroid is the mean of the points in that cluster.
'cityblock'
Sum of absolute differences, i.e., the L1 distance. Each centroid is the component-wise median of the points in that cluster.
'cosine'
One minus the cosine of the included angle between points (treated as vectors). Each centroid is the mean of the points in that cluster, after normalizing those points to unit Euclidean length.
'correlation'
One minus the sample correlation between points (treated as sequences of values). Each centroid is the component-wise mean of the points in that cluster, after centering and normalizing those points to zero mean and unit standard deviation.
'Hamming'
Percentage of bits that differ (only suitable for binary data). Each centroid is the component-wise median of points in that cluster.
So, for example:
[idx,ctrs] = kmeans(X,2, 'Distance','cityblock');
As for custom functions (i.e., user-implemented): AFAIK, this is not possible without hacking the relevant m-files.
Upvotes: 0