Reputation: 1377
My goal is to cluster the items in the list below based on distances between any two consecutive items. The number of clusters is not specified, only the maximum distance between any two consecutive items that must not be exceeded for them to be in the same cluster.
My attempt
import itertools
max_dist=20
_list=[1,48,52,59,89,94,103,147,151,165]
Ideal_result= [[1],[48,52,59],[89,94,103],[147,151,165]]
def clust(list_x, max_dist):
q=[]
for a, b in itertools.combinations(list_x,2):
if b-a<=20:
q.append(a),(b)
else:continue
yield q
print list(clust(_list,max_dist))
Output:
[[48,48,52,89,89,94,147,147,151],[48,48,52,89,89,94,147,147,151],..]`
The output is completely wrong, but I just wanted to include my attempt.
Any suggestions on how to get the ideal result? Thanks.
Upvotes: 2
Views: 2176
Reputation: 64022
This passes your test:
def cluster(items, key_func):
items = sorted(items)
clusters = [[items[0]]]
for item in items[1:]:
cluster = clusters[-1]
last_item = cluster[-1]
if key_func(item, last_item):
cluster.append(item)
else:
clusters.append([item])
return clusters
Where key_func
returns True
if the current and previous items should be part of the same cluster:
>>> cluster([1,48,52,59,89,94,103,147,151,165], lambda curr, prev: curr-prev < 20)
[[1], [48, 52, 59], [89, 94, 103], [147, 151, 165]]
Another possibility would be to modify the "equivalent code" for itertools.groupby()
to likewise take more than one argument for the key function.
Upvotes: 2