Reputation: 59
I have the following numpy array:
array=[1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7]
I need to break this array into smaller arrays of same values such as
[1,1,1,1] and [3,3,3]
My code for this is as follows but it doesn't work:
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq)-size))
counter=0
sub_arr=[]
arr=[]
for i in range(len(array)):
if(array[i]==array[i+1]):
counter+=1
else:
break
subarr=chunker(array,counter)
arr.append(sub_arr)
array=array[counter:]
what is an efficient to break down the array into smaller arrays of equal/same values?
Upvotes: 2
Views: 405
Reputation: 12410
A numpy solution for floats and integers:
import numpy as np
a = np.asarray([1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7])
#calculate differences between neighbouring elements and get index where element changes
#sample output for index would be [ 4 6 9 10 16]
index = np.where(np.diff(a) != 0)[0] + 1
#separate arrays
print(np.split(a, index))
Sample output:
[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
If you had strings, this method naturally wouldn't work. Then you should go with DyZ's itertools
approach.
Upvotes: 3
Reputation: 21264
Here's an approach using Pandas:
import pandas as pd
(pd.Series(array)
.value_counts()
.reset_index()
.apply(lambda x: [x["index"]] * x[0], axis=1))
Explanation:
First, convert array
to a Series, and use value_counts()
to get a count of each unique entry:
counts = pd.Series(array).value_counts().reset_index()
index 0
0 6 6
1 1 4
2 3 3
3 2 2
4 7 1
5 5 1
Then recreate each repeated-element list, using apply()
:
counts.apply(lambda x: [x["index"]] * x[0], axis=1)
0 [6, 6, 6, 6, 6, 6]
1 [1, 1, 1, 1]
2 [3, 3, 3]
3 [2, 2]
4 [7]
5 [5]
dtype: object
You can use the .values
property to convert from a Series of lists to a list of lists, if needed.
Upvotes: 0
Reputation: 57033
NumPy has poor support for such grouping. I suggest using itertools
that operate on lists.
from itertools import groupby
[np.array(list(data)) for _,data in itertools.groupby(array)]
#[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), \
# array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]
This is not necessarily the most efficient method, because it involves converstions to and from lists.
Upvotes: 2