Reputation: 1184
I want to cluster some data points but the maximum number of points per cluster is limited. So there is a maximum size per cluster. Is there any clustering algorithm for that? Also Can I define my own size function. For example, instead of considering the number of points in a cluster as its size, I want to sum a column of all the points in the cluster.
Upvotes: 4
Views: 4090
Reputation: 91
The problem of k-means clustering with minimum size constraints is addressed in this paper:
Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.
However, the approach proposed in this paper can be easily extended to the maximum size constraints.
Here is an implementation of this algorithm and an extension to it which addresses both minimum size and maximum size constraints.
AS for your question about custom size function, it will be a more difficult problem for which I guess local search approaches are more appropriate.
Upvotes: 2
Reputation: 6186
A quick and not a optimal solution is spliting data into 2 parts iteratively until the number of data is under the limitation.
Upvotes: 1
Reputation: 77464
As clustering will usually try to make the clusters as large as possible, this isn't really clustering then anymore. More like a minimum spanning tree, where you remove the longest edges to find groups.
You could try something like x-means, i.e. a k-means variation where you split clusters that you consider to be too large.
Upvotes: 0