Mathew
Mathew

Reputation: 307

Identify vectors with same value in one column with numpy in python

I have a large 2d array of vectors. I want to split this array into several arrays according to one of the vectors' elements or dimensions. I would like to receive one such small array if the values along this column are consecutively identical. For example considering the third dimension or column:

orig = np.array([[1, 2, 3], 
                 [3, 4, 3], 
                 [5, 6, 4], 
                 [7, 8, 4], 
                 [9, 0, 4], 
                 [8, 7, 3], 
                 [6, 5, 3]])

I want to turn into three arrays consisting of rows 1,2 and 3,4,5 and 6,7:

>>> a
array([[1, 2, 3],
       [3, 4, 3]])

>>> b
array([[5, 6, 4],
       [7, 8, 4],
       [9, 0, 4]])

>>> c
array([[8, 7, 3],
       [6, 5, 3]])

I'm new to python and numpy. Any help would be greatly appreciated.

Regards Mat

Edit: I reformatted the arrays to clarify the problem

Upvotes: 4

Views: 1704

Answers (3)

Mike Müller
Mike Müller

Reputation: 85442

if a looks like this:

array([[1, 1, 2, 3],
       [2, 1, 2, 3],
       [3, 1, 2, 4],
       [4, 1, 2, 4],
       [5, 1, 2, 4],
       [6, 1, 2, 3],
       [7, 1, 2, 3]])

than this

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
indices = np.concatenate(([0], indices, [len(a)]))
res = [a[start:end] for start, end in zip(indices[:-1], indices[1:])]
print(res)

results in:

[array([[1, 2, 3],
       [1, 2, 3]]), array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]]), array([[1, 2, 3],
       [1, 2, 3]])]

Update: np.split() is much nicer. No need to add first and last index:

col = a[:, -1]
indices = np.where(col[:-1] != col[1:])[0] + 1
res = np.split(a, indices)

Upvotes: 0

Jaime
Jaime

Reputation: 67427

Using np.split:

>>> a, b, c = np.split(orig, np.where(orig[:-1, 2] != orig[1:, 2])[0]+1)

>>> a
array([[1, 2, 3],
       [1, 2, 3]])
>>> b
array([[1, 2, 4],
       [1, 2, 4],
       [1, 2, 4]])
>>> c
array([[1, 2, 3],
       [1, 2, 3]])

Upvotes: 8

Julien Spronck
Julien Spronck

Reputation: 15423

Nothing fancy here, but this good old-fashioned loop should do the trick

import numpy as np

a = np.array([[1, 2, 3], 
              [1, 2, 3], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 4], 
              [1, 2, 3], 
              [1, 2, 3]])
groups = []
rows = a[0]
prev = a[0][-1] # here i assume that the grouping is based on the last column, change the index accordingly if that is not the case.
for row in a[1:]:
    if row[-1] == prev:
        rows = np.vstack((rows, row))
    else:
        groups.append(rows)
        rows = [row]
    prev = row[-1]
groups.append(rows)

print groups

## [array([[1, 2, 3],
##         [1, 2, 3]]),
##  array([[1, 2, 4],
##         [1, 2, 4],
##         [1, 2, 4]]),
##  array([[1, 2, 3],
##         [1, 2, 3]])]

Upvotes: 0

Related Questions