Reputation: 2390
I am trying to find clusters (i.e. groups within an array where the difference between [n+1] and [n] is less than a certain value) inside an array. I have a numpy array that is a sequence of time stamps. I can find the difference between time stamps using numpy.diff(), but I have a hard time trying to determine clusters without looping through the array. To exemplify this:
t = t = np.array([ 147, 5729, 5794, 5806, 6798, 8756, 8772, 8776, 9976])
dt = np.diff(t)
dt = array([5582, 65, 12, 992, 1958, 16, 4, 1200])
If my cluster condition is dt < 100 t[1], t[2], and t[3] would be one cluster and t[5], t[6], and t[7] would be another. I have tried playing around with numpy.where(), but I am having no success with getting the conditions tuned right to separate out the clusters, i.e.
cluster1 = np.array([5729, 5794, 5806])
cluster2 = np.array([8756, 8772, 8776])
or something along the lines.
Any help is appreciated.
Upvotes: 3
Views: 2887
Reputation: 97261
import numpy as np
t = np.array([ 147, 5729, 5794, 5806, 6798, 8756, 8772, 8776, 9976])
dt = np.diff(t)
pos = np.where(dt > 100)[0] + 1
print np.split(t, pos)
the output is:
[array([147]),
array([5729, 5794, 5806]),
array([6798]),
array([8756, 8772, 8776]),
array([9976])]
Upvotes: 7