orange
orange

Reputation: 8090

Weighted clustering with pycluster

I've managed to adopt a code snippet for how to use PyCluster's k-means clustering algorithm. I was hoping to be able to weight the data points, but unfortunately, I can only weigh the features. Am I missing something or is there maybe a trick I can use to make some of the points count more than others?

import numpy as np
import Pycluster as pc

points = np.asarray([
    [1.0, 20, 30, 50],
    [1.2, 15, 34, 50],
    [1.6, 13, 20, 55],
    [0.1, 16, 40, 26],
    [0.3, 26, 30, 23],
    [1.4, 20, 28, 20],
])

# would like to specify 6 weights for each of the elements in `points`
weights = np.asarray([1.0, 1.0, 1.0, 1.0])

clusterid, error, nfound = pc.kcluster(
    points, nclusters=2, transpose=0, npass=10, method='a', dist='e', weight=weights
)
centroids, _ = pc.clustercentroids(points, clusterid=clusterid)
print centroids

Upvotes: 4

Views: 3396

Answers (2)

scc
scc

Reputation: 10696

Nowadays you can use the sample_weights in sklearn's fit method. Here's an example.

Upvotes: 1

Prune
Prune

Reputation: 77847

Weighting the individual data points is not a feature of the KMeans algorithm. This is in the algorithm definition: it's not available in pycluster, MLlib, or TrustedAnalytics.

You can, however, add duplicate data points. For instance, if you want that second data point to count twice as much, alter your list to read:

points = np.asarray([
    [1.0, 20, 30, 50],
    [1.2, 15, 34, 50],
    [1.2, 15, 34, 50],
    [1.6, 13, 20, 55],
    [0.1, 16, 40, 26],
    [0.3, 26, 30, 23],
    [1.4, 20, 28, 20],
])

Upvotes: 0

Related Questions