Reputation: 8075
I have a list:
d = [23, 67, 110, 25, 69, 24, 102, 109]
how can I group nearest values with a dynamic gap, and create a tuple like this, what is the fastest method? :
[(23, 24, 25), (67, 69), (102, 109, 110)]
Upvotes: 9
Views: 5692
Reputation: 51
You can use DBSCAN clustering algorithm for this import from sklearn.
import numpy as np
from sklearn.cluster import DBSCAN
d = [23, 67, 110, 25, 69, 24, 102, 109]
threshold=3 # max distance between numbers
dbscan = DBSCAN(eps=3, min_samples=1)
labels = dbscan.fit(np.asarray(d).reshape(-1, 1)).labels_
print(d)
print(labels)
# [23, 67, 110, 25, 69, 24, 102, 109]
# [0, 1, 2, 0, 1, 0, 3, 2]
Upvotes: 1
Reputation: 215039
Like
d = [23,67,110,25,69,24,102,109]
d.sort()
diff = [y - x for x, y in zip(*[iter(d)] * 2)]
avg = sum(diff) / len(diff)
m = [[d[0]]]
for x in d[1:]:
if x - m[-1][0] < avg:
m[-1].append(x)
else:
m.append([x])
print m
## [[23, 24, 25], [67, 69], [102, 109, 110]]
Fist we calculate an average difference between sequential elements and then group together elements whose difference is less than average.
Upvotes: 23