eudoxos
eudoxos

Reputation: 19075

sort list of floating-point numbers in groups

I have an array of floating-point numbers, which is unordered. I know that the values always fall around a few points, which are not known. For illustration, this list

[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]

has values clustered around 5 and 10, so I would like [5,10] as answer.

I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance). How to do that efficiently?

Upvotes: 8

Views: 2179

Answers (2)

Fábio Diniz
Fábio Diniz

Reputation: 10363

Check python-cluster. With this library you could do something like this :

from cluster import *

data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]

And you would get:

[5.0062, 10.003333333333332]

(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library)

Upvotes: 16

HYRY
HYRY

Reputation: 97331

You can try the following method:

Sort the array first, and use diff() to calculate the difference between two continuous values. the difference larger than threshold can be consider as the split position:

import numpy as np
x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
x = np.sort(x)
th = 0.5
print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)]

the result is:

[5.0061999999999998, 10.003333333333332]

Upvotes: 8

Related Questions