listener
listener

Reputation: 59

Breaking down numpy array into smaller arrays of same value [Python]

I have the following numpy array:

array=[1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7]

I need to break this array into smaller arrays of same values such as

[1,1,1,1] and [3,3,3]

My code for this is as follows but it doesn't work:

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq)-size))
counter=0
sub_arr=[]
arr=[]
for i in range(len(array)):
    if(array[i]==array[i+1]):
        counter+=1
    else:
        break
        subarr=chunker(array,counter)
    arr.append(sub_arr)
    array=array[counter:]

what is an efficient to break down the array into smaller arrays of equal/same values?

Upvotes: 2

Views: 405

Answers (3)

Mr. T
Mr. T

Reputation: 12410

A numpy solution for floats and integers:

import numpy as np
a = np.asarray([1,1,1,1,2,2,3,3,3,5,6,6,6,6,6,6,7])
#calculate differences between neighbouring elements and get index where element changes
#sample output for index would be [ 4  6  9 10 16]
index = np.where(np.diff(a) != 0)[0] + 1
#separate arrays
print(np.split(a, index))

Sample output:

[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]

If you had strings, this method naturally wouldn't work. Then you should go with DyZ's itertools approach.

Upvotes: 3

andrew_reece
andrew_reece

Reputation: 21264

Here's an approach using Pandas:

import pandas as pd 

(pd.Series(array)
   .value_counts()
   .reset_index()
   .apply(lambda x: [x["index"]] * x[0], axis=1))  

Explanation:
First, convert array to a Series, and use value_counts() to get a count of each unique entry:

counts = pd.Series(array).value_counts().reset_index()
   index  0
0      6  6
1      1  4
2      3  3
3      2  2
4      7  1
5      5  1

Then recreate each repeated-element list, using apply():

counts.apply(lambda x: [x["index"]] * x[0], axis=1)

0    [6, 6, 6, 6, 6, 6]
1          [1, 1, 1, 1]
2             [3, 3, 3]
3                [2, 2]
4                   [7]
5                   [5]
dtype: object

You can use the .values property to convert from a Series of lists to a list of lists, if needed.

Upvotes: 0

DYZ
DYZ

Reputation: 57033

NumPy has poor support for such grouping. I suggest using itertools that operate on lists.

from itertools import groupby
[np.array(list(data)) for _,data in itertools.groupby(array)]
#[array([1, 1, 1, 1]), array([2, 2]), array([3, 3, 3]), \
# array([5]), array([6, 6, 6, 6, 6, 6]), array([7])]

This is not necessarily the most efficient method, because it involves converstions to and from lists.

Upvotes: 2

Related Questions