Akins Nazri
Akins Nazri

Reputation: 307

PYTHON: How to merge equal element numpy array

I want to merge an two equal element in an array, let say I am having an array like this

np.array([[0,1,1,2,2],
          [0,1,1,2,2],
          [0,2,2,2,2]])

I want to produce something like this if I am directing it right

np.array([[0,0,2,0,4],
          [0,0,2,0,4],
          [0,0,4,0,4]])

And this if I am moving it up

np.array([[0,2,2,4,4],
          [0,0,0,0,0],
          [0,2,2,2,2]])

My current code simply loops through a normal list

    for i in range(4):
     for j in range(3):
         if mat[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
             matrix[i][j]*=2
             matrix[i][j+1]=0

I prefer numpy and absence of loops if possible

Upvotes: 2

Views: 1433

Answers (2)

Daniel F
Daniel F

Reputation: 14399

This task is deceptively difficult to do without loops! You'll need a bunch of high-level numpy tricks to make it work. I sort of fly through them here, but I will try to link to other resources where I can.

From here, the best way to do row-wise comparison is:

a = np.array([[0,1,1,2,2],
              [0,1,1,2,2],
              [0,2,2,2,2]])

b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))

b

array([[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
       [[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
       [[0 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0]]], 
       dtype='|V20')

b.shape
(3, 1)

Notice that the innermost brackets are not an additional dimension, but an np.void object that can be compared with things like np.unique.

Still, getting the indices you want to keep isn't really easy, but here's the one-liner:

c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])

Eech. It's kinda messy. Basically you're looking for the indices where the lines change, and the first line. Normally you wouldn't need the np.unique call and could just do np.diff(b), but you can't subtract np.void. np.r_ is a shortcut for np.concatenate that's a bit more readable. And np.flatnonzero gives you the indices where your new array isn't zero (i.e. the indices you want to keep)

c
array([0, 2], dtype=int32)

There, now you can use some fancy ufunc.reduceat math to do your addition:

d = np.add.reduceat(a, c, axis = 0)

d

array([[0, 2, 2, 4, 4],
       [0, 2, 2, 2, 2]], dtype=int32)

OK, now to add the zeros, we'll just plug that into an np.zero array using advanced indexing

e = np.zeros_like(a)
e[c] = d
e

array([[0, 2, 2, 4, 4],
       [0, 0, 0, 0, 0],
       [0, 2, 2, 2, 2]])

And there we go! You can go in other directions by transposing or flipping the matrix at the beginning and the end.

def reduce_duplicates(a):
    b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
    c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
    d = np.add.reduceat(a, c, axis = 0)
    e = np.zeros_like(a)
    e[c] = d
    return e

reduce_duplicates(a.T[::-1,:])[::-1,:].T  #reducing right

array([[0, 0, 2, 0, 4],
       [0, 0, 2, 0, 4],
       [0, 0, 4, 0, 4]])

I don't have numba so I can't test speed against the other suggestion (knowing numba it is probably slower), but it is loopless and numpy.

Upvotes: 1

chrisbarber
chrisbarber

Reputation: 83

A "vectorized" version of your function would be pretty messy, since the merges can happen at both even or odd indexes in each row/column, depending on preceding values in that row/column.

To illustrate, see how this vectorized version works on your (horizontal) example which happens to have all merges land on odd indexes:

>>> x
array([[0, 1, 1, 2, 2],
       [0, 1, 1, 2, 2],
       [0, 2, 2, 2, 2]])
>>> y=x==np.roll(x, 1, axis=1); y[:,1::2]=False; x*y*2
array([[0, 0, 2, 0, 4],
       [0, 0, 2, 0, 4],
       [0, 0, 4, 0, 4]])

But if I shift one of the rows by 1, it no longer works:

>>> x2
array([[0, 1, 1, 2, 2],
       [0, 0, 1, 1, 2],
       [0, 2, 2, 2, 2]])
>>> y=x2==np.roll(x2, 1, axis=1); y[:,1::2]=False; x2*y*2
array([[0, 0, 2, 0, 4],
       [0, 0, 0, 0, 0],
       [0, 0, 4, 0, 4]])

I'm not sure what strategy I would take next, if it is possible to implement this in a vectorized fashion, but it wouldn't be very clean.

I would suggest using numba for something like this. It will keep your code readable and should make it faster. Just add the @jit decorator to your function and evaluate how much it improves performance.

EDIT: I did some timing for you. Also there is a small fix to your function to make it coincide with your example.

>>> def foo(matrix):
...     for i in range(matrix.shape[0]):
...         for j in range(matrix.shape[1]-1):
...             if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
...                 matrix[i][j+1]*=2
...                 matrix[i][j]=0
...
>>> from numba import jit
>>> @jit
... def foo2(matrix):
...     for i in range(matrix.shape[0]):
...         for j in range(matrix.shape[1]-1):
...             if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
...                 matrix[i][j+1]*=2
...                 matrix[i][j]=0
...
>>> import time
>>> z=np.random.random((1000,1000)); start=time.time(); foo(z); print(time.time()-start)
1.0277159214
>>> z=np.random.random((1000,1000)); start=time.time(); foo2(z); print(time.time()-start)
0.00354909896851

Upvotes: 0

Related Questions