Reputation: 31
I have a list which needs to be split into multiple lists of differing size. The values in the original list randomly increase in size until the split point, where the value drops before continuing to increase. The values must remain in order after being split.
E.g. Original list
[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631, 70, 119,
125, 130, 134, 139, 144, 149, 154, 159, 614, 669, 100, 136, 144, 149, 153,
158, 163, 167, 173, 179, 62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117,
124, 426, 100, 129, 135, 140, 145, 151]
After split:
[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631]
[70, 119, 125, 130, 134, 139, 144, 149, 154, 159, 614, 669]
[100, 136, 144, 149, 153, 158, 163, 167, 173, 179]
[62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 124, 426]
[100, 129, 135, 140, 145, 151]
I have searched for a solution, finding numpy.where
and numpy.diff
as likely candidates, but I'm unsure how to implement.
Thanks for the help!
Upvotes: 2
Views: 4365
Reputation: 221564
Approach #1
Using NumPy's numpy.split
to have list of arrays as output -
import numpy as np
arr = np.array(a) # a is input list
out = np.split(arr,np.flatnonzero(arr[1:] < arr[:-1])+1)
Approach #2
Using loop comrehension to split the list directly and thus avoid numpy.split
for efficiency purposes -
idx = np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]
out = [a[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Output for given sample -
In [52]: idx = np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]
In [53]: [a[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Out[53]:
[[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631],
[70, 119, 125, 130, 134, 139, 144, 149, 154, 159, 614, 669],
[100, 136, 144, 149, 153, 158, 163, 167, 173, 179],
[62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 124, 426],
[100, 129, 135, 140, 145, 151]]
We are using np.diff
here, which feeds in a list in this case and then computes the differentiation. So, a better alternative would be with converting to array and then using comparison between shifted slices of it instead of actually computing the differentiation values. Thus, we could get idx
like this as well -
arr = np.asarray(a)
idx = np.r_[0, np.flatnonzero(arr[1:] < arr[:-1])+1, len(arr)]
Let's time it and see if there's any improvement -
In [84]: a = np.random.randint(0,100,(1000,100)).cumsum(1).ravel().tolist()
In [85]: %timeit np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]
100 loops, best of 3: 3.24 ms per loop
In [86]: arr = np.asarray(a)
In [87]: %timeit np.asarray(a)
100 loops, best of 3: 3.05 ms per loop
In [88]: %timeit np.r_[0, np.flatnonzero(arr[1:] < arr[:-1])+1, len(arr)]
10000 loops, best of 3: 77 µs per loop
In [89]: 3.05+0.077
Out[89]: 3.127
So, a marginal improvement there with the shifting and comparing method with the conversion : np.asarray(a)
eating-up most of the runtime.
Upvotes: 7
Reputation: 11302
If you want to use numpy.diff
and numpy.where
, you can try
a = numpy.array(your original list)
numpy.split(a, numpy.where(numpy.diff(a) < 0)[0] + 1)
Explanation:
numpy.diff(a)
calculates the difference of each item and its preceding one, and returns an array.
numpy.diff(a) < 0
returns a boolean array, where each element is replaced by whether it satisfies the predicate, in this case less than zero. This is the result of numpy.ndarray
overloading the comparison operators.
numpy.where
takes this boolean array and returns the indices where the element is not zero. In this context False
evaluates to zero, so you take the indices of True
.
[0]
takes the first (and only) axis
+ 1
You want to break off from after the indices, not before
Finally, numpy.split
breaks them off at the given indices.
Upvotes: -1
Reputation: 679
I know you tagged numpy
. But here's a implementation without any dependencies too:
lst = [100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631, 70, 119,
125, 130, 134, 139, 144, 149, 154, 159, 614, 669, 100, 136, 144, 149, 153,
158, 163, 167, 173, 179, 62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117,
124, 426, 100, 129, 135, 140, 145, 151]
def split(lst):
last_pos = 0
for i in range(1, len(lst)):
if lst[i] < lst[i-1]:
yield lst[last_pos:i]
last_pos = i
if(last_pos <= len(lst)-1):
yield lst[last_pos:]
print([x for x in split(lst)])
Upvotes: 4