Matt_EJ
Matt_EJ

Reputation: 31

Python - Split array into multiple arrays dependent on array values

I have a list which needs to be split into multiple lists of differing size. The values in the original list randomly increase in size until the split point, where the value drops before continuing to increase. The values must remain in order after being split.

E.g. Original list

[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631, 70, 119, 
125, 130, 134, 139, 144, 149, 154, 159, 614, 669, 100, 136, 144, 149, 153, 
158, 163, 167, 173, 179, 62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 
124, 426, 100, 129, 135, 140, 145, 151]

After split:

[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631]
[70, 119, 125, 130, 134, 139, 144, 149, 154, 159, 614, 669]
[100, 136, 144, 149, 153, 158, 163, 167, 173, 179]
[62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 124, 426]
[100, 129, 135, 140, 145, 151]

I have searched for a solution, finding numpy.where and numpy.diff as likely candidates, but I'm unsure how to implement.

Thanks for the help!

Upvotes: 2

Views: 4365

Answers (3)

Divakar
Divakar

Reputation: 221564

Approach #1

Using NumPy's numpy.split to have list of arrays as output -

import numpy as np

arr = np.array(a) # a is input list
out = np.split(arr,np.flatnonzero(arr[1:] < arr[:-1])+1)

Approach #2

Using loop comrehension to split the list directly and thus avoid numpy.split for efficiency purposes -

idx = np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]
out = [a[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

Output for given sample -

In [52]: idx = np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]

In [53]: [a[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Out[53]: 
[[100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631],
 [70, 119, 125, 130, 134, 139, 144, 149, 154, 159, 614, 669],
 [100, 136, 144, 149, 153, 158, 163, 167, 173, 179],
 [62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 124, 426],
 [100, 129, 135, 140, 145, 151]]

We are using np.diff here, which feeds in a list in this case and then computes the differentiation. So, a better alternative would be with converting to array and then using comparison between shifted slices of it instead of actually computing the differentiation values. Thus, we could get idx like this as well -

arr = np.asarray(a)
idx = np.r_[0, np.flatnonzero(arr[1:] < arr[:-1])+1, len(arr)]

Let's time it and see if there's any improvement -

In [84]: a = np.random.randint(0,100,(1000,100)).cumsum(1).ravel().tolist()

In [85]: %timeit np.r_[0, np.flatnonzero(np.diff(a)<0)+1, len(a)]
100 loops, best of 3: 3.24 ms per loop

In [86]: arr = np.asarray(a)

In [87]: %timeit np.asarray(a)
100 loops, best of 3: 3.05 ms per loop

In [88]: %timeit np.r_[0, np.flatnonzero(arr[1:] < arr[:-1])+1, len(arr)]
10000 loops, best of 3: 77 µs per loop

In [89]: 3.05+0.077
Out[89]: 3.127

So, a marginal improvement there with the shifting and comparing method with the conversion : np.asarray(a) eating-up most of the runtime.

Upvotes: 7

Cong Ma
Cong Ma

Reputation: 11302

If you want to use numpy.diff and numpy.where, you can try

a = numpy.array(your original list)
numpy.split(a, numpy.where(numpy.diff(a) < 0)[0] + 1)

Explanation:

numpy.diff(a) calculates the difference of each item and its preceding one, and returns an array.

numpy.diff(a) < 0 returns a boolean array, where each element is replaced by whether it satisfies the predicate, in this case less than zero. This is the result of numpy.ndarray overloading the comparison operators.

numpy.where takes this boolean array and returns the indices where the element is not zero. In this context False evaluates to zero, so you take the indices of True.

[0] takes the first (and only) axis

+ 1 You want to break off from after the indices, not before

Finally, numpy.split breaks them off at the given indices.

Upvotes: -1

Flurin
Flurin

Reputation: 679

I know you tagged numpy. But here's a implementation without any dependencies too:

lst = [100, 564, 572, 578, 584, 590, 596, 602, 608, 614, 620, 625, 631, 70, 119, 
125, 130, 134, 139, 144, 149, 154, 159, 614, 669, 100, 136, 144, 149, 153, 
158, 163, 167, 173, 179, 62, 72, 78, 82, 87, 92, 97, 100, 107, 112, 117, 
124, 426, 100, 129, 135, 140, 145, 151]

def split(lst):
  last_pos = 0
  for i in range(1, len(lst)):
    if lst[i] < lst[i-1]:
      yield lst[last_pos:i]
      last_pos = i
  if(last_pos <= len(lst)-1):
    yield lst[last_pos:]

print([x for x in split(lst)])

Upvotes: 4

Related Questions