Reputation: 307
I want to merge an two equal element in an array, let say I am having an array like this
np.array([[0,1,1,2,2],
[0,1,1,2,2],
[0,2,2,2,2]])
I want to produce something like this if I am directing it right
np.array([[0,0,2,0,4],
[0,0,2,0,4],
[0,0,4,0,4]])
And this if I am moving it up
np.array([[0,2,2,4,4],
[0,0,0,0,0],
[0,2,2,2,2]])
My current code simply loops through a normal list
for i in range(4):
for j in range(3):
if mat[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
matrix[i][j]*=2
matrix[i][j+1]=0
I prefer numpy and absence of loops if possible
Upvotes: 2
Views: 1433
Reputation: 14399
This task is deceptively difficult to do without loops! You'll need a bunch of high-level numpy
tricks to make it work. I sort of fly through them here, but I will try to link to other resources where I can.
From here, the best way to do row-wise comparison is:
a = np.array([[0,1,1,2,2],
[0,1,1,2,2],
[0,2,2,2,2]])
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b
array([[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0]]],
dtype='|V20')
b.shape
(3, 1)
Notice that the innermost brackets are not an additional dimension, but an np.void
object that can be compared with things like np.unique
.
Still, getting the indices you want to keep isn't really easy, but here's the one-liner:
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
Eech. It's kinda messy. Basically you're looking for the indices where the lines change, and the first line. Normally you wouldn't need the np.unique
call and could just do np.diff(b)
, but you can't subtract np.void
. np.r_
is a shortcut for np.concatenate
that's a bit more readable. And np.flatnonzero
gives you the indices where your new array isn't zero (i.e. the indices you want to keep)
c
array([0, 2], dtype=int32)
There, now you can use some fancy ufunc.reduceat
math to do your addition:
d = np.add.reduceat(a, c, axis = 0)
d
array([[0, 2, 2, 4, 4],
[0, 2, 2, 2, 2]], dtype=int32)
OK, now to add the zeros, we'll just plug that into an np.zero
array using advanced indexing
e = np.zeros_like(a)
e[c] = d
e
array([[0, 2, 2, 4, 4],
[0, 0, 0, 0, 0],
[0, 2, 2, 2, 2]])
And there we go! You can go in other directions by transposing or flipping the matrix at the beginning and the end.
def reduce_duplicates(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
d = np.add.reduceat(a, c, axis = 0)
e = np.zeros_like(a)
e[c] = d
return e
reduce_duplicates(a.T[::-1,:])[::-1,:].T #reducing right
array([[0, 0, 2, 0, 4],
[0, 0, 2, 0, 4],
[0, 0, 4, 0, 4]])
I don't have numba
so I can't test speed against the other suggestion (knowing numba
it is probably slower), but it is loopless and numpy.
Upvotes: 1
Reputation: 83
A "vectorized" version of your function would be pretty messy, since the merges can happen at both even or odd indexes in each row/column, depending on preceding values in that row/column.
To illustrate, see how this vectorized version works on your (horizontal) example which happens to have all merges land on odd indexes:
>>> x
array([[0, 1, 1, 2, 2],
[0, 1, 1, 2, 2],
[0, 2, 2, 2, 2]])
>>> y=x==np.roll(x, 1, axis=1); y[:,1::2]=False; x*y*2
array([[0, 0, 2, 0, 4],
[0, 0, 2, 0, 4],
[0, 0, 4, 0, 4]])
But if I shift one of the rows by 1, it no longer works:
>>> x2
array([[0, 1, 1, 2, 2],
[0, 0, 1, 1, 2],
[0, 2, 2, 2, 2]])
>>> y=x2==np.roll(x2, 1, axis=1); y[:,1::2]=False; x2*y*2
array([[0, 0, 2, 0, 4],
[0, 0, 0, 0, 0],
[0, 0, 4, 0, 4]])
I'm not sure what strategy I would take next, if it is possible to implement this in a vectorized fashion, but it wouldn't be very clean.
I would suggest using numba for something like this. It will keep your code readable and should make it faster. Just add the @jit
decorator to your function and evaluate how much it improves performance.
EDIT: I did some timing for you. Also there is a small fix to your function to make it coincide with your example.
>>> def foo(matrix):
... for i in range(matrix.shape[0]):
... for j in range(matrix.shape[1]-1):
... if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
... matrix[i][j+1]*=2
... matrix[i][j]=0
...
>>> from numba import jit
>>> @jit
... def foo2(matrix):
... for i in range(matrix.shape[0]):
... for j in range(matrix.shape[1]-1):
... if matrix[i][j]==matrix[i][j+1] and matrix[i][j]!=0:
... matrix[i][j+1]*=2
... matrix[i][j]=0
...
>>> import time
>>> z=np.random.random((1000,1000)); start=time.time(); foo(z); print(time.time()-start)
1.0277159214
>>> z=np.random.random((1000,1000)); start=time.time(); foo2(z); print(time.time()-start)
0.00354909896851
Upvotes: 0