Reputation: 8090
I've managed to adopt a code snippet for how to use PyCluster
's k-means clustering algorithm. I was hoping to be able to weight the data points, but unfortunately, I can only weigh the features. Am I missing something or is there maybe a trick I can use to make some of the points count more than others?
import numpy as np
import Pycluster as pc
points = np.asarray([
[1.0, 20, 30, 50],
[1.2, 15, 34, 50],
[1.6, 13, 20, 55],
[0.1, 16, 40, 26],
[0.3, 26, 30, 23],
[1.4, 20, 28, 20],
])
# would like to specify 6 weights for each of the elements in `points`
weights = np.asarray([1.0, 1.0, 1.0, 1.0])
clusterid, error, nfound = pc.kcluster(
points, nclusters=2, transpose=0, npass=10, method='a', dist='e', weight=weights
)
centroids, _ = pc.clustercentroids(points, clusterid=clusterid)
print centroids
Upvotes: 4
Views: 3396
Reputation: 10696
Nowadays you can use the sample_weights in sklearn's fit method. Here's an example.
Upvotes: 1
Reputation: 77847
Weighting the individual data points is not a feature of the KMeans algorithm. This is in the algorithm definition: it's not available in pycluster, MLlib, or TrustedAnalytics.
You can, however, add duplicate data points. For instance, if you want that second data point to count twice as much, alter your list to read:
points = np.asarray([
[1.0, 20, 30, 50],
[1.2, 15, 34, 50],
[1.2, 15, 34, 50],
[1.6, 13, 20, 55],
[0.1, 16, 40, 26],
[0.3, 26, 30, 23],
[1.4, 20, 28, 20],
])
Upvotes: 0