Reputation: 132
I have a matrix in numpy, that is a NxM ndarray that looks like the following one:
[
[ 0, 5, 11, 22, 0, 0, 11, 22],
[ 1, 4, 11, 20, 0, 4, 11, 20],
[ 1, 6, 11, 22, 0, 1, 11, 22],
[ 4, 7, 12, 21, 0, 4, 12, 21],
[ 5, 7, 12, 22, 0, 7, 12, 22],
[ 5, 7, 12, 22, 0, 5, 12, 22]
]
I would like to sort it by rows putting the zeros in each row first without changing the order of the other elements along the row.
My desired output is the following:
[
[ 0, 0, 0, 5, 11, 22, 11, 22],
[ 0, 1, 4, 11, 20, 4, 11, 20],
[ 0, 1, 6, 11, 22, 1, 11, 22],
[ 0, 4, 7, 12, 21, 4, 12, 21],
[ 0, 5, 7, 12, 22, 7, 12, 22],
[ 0, 5, 7, 12, 22, 5, 12, 22]
]
For a matter of efficiency I am required to do it using numpy (so switching to Python's regular nested lists and doing calculations on them is discouraged). The faster the code, the better.
How could I do that?
Best, Andrea
Upvotes: 4
Views: 187
Reputation: 67467
It is possible to get rid of all the Python looping, building a boolean mask with the help of np.tile
and np.repeat
, although you will have to time it on some larger example to see if it is worth the extra complexity:
rows, cols = a.shape
mask = a != 0
nonzeros_per_row = mask.sum(axis=1)
repeats = np.column_stack((cols-nonzeros_per_row, nonzeros_per_row)).ravel()
new_mask = np.repeat(np.tile([False, True], rows), repeats).reshape(rows, cols)
out = np.zeros_like(a)
out[new_mask] = a[mask]
>>> a
array([[ 0, 5, 11, 22, 0, 0, 11, 22],
[ 1, 4, 11, 20, 0, 4, 11, 20],
[ 1, 6, 11, 22, 0, 1, 11, 22],
[ 4, 7, 12, 21, 0, 4, 12, 21],
[ 5, 7, 12, 22, 0, 7, 12, 22],
[ 5, 7, 12, 22, 0, 5, 12, 22]])
>>> out
array([[ 0, 0, 0, 5, 11, 22, 11, 22],
[ 0, 1, 4, 11, 20, 4, 11, 20],
[ 0, 1, 6, 11, 22, 1, 11, 22],
[ 0, 4, 7, 12, 21, 4, 12, 21],
[ 0, 5, 7, 12, 22, 7, 12, 22],
[ 0, 5, 7, 12, 22, 5, 12, 22]])
Upvotes: 0
Reputation: 3879
This approach gets a binary array of where your array is zero and non-zero, then gets the sort index for that, then applies that to the original array.
You'll need an array as big as your to-be-sorted array to hold the index, but since it's all numpy operations it might be faster than looping.
ind = (a>0).astype(int)
ind = ind.argsort(axis=1)
a[np.arange(ind.shape[0])[:,None], ind]
output:
>>> a
array([[ 0, 0, 0, 5, 11, 22, 11, 22],
[ 0, 1, 4, 11, 20, 4, 11, 20],
[ 0, 1, 6, 11, 22, 1, 11, 22],
[ 0, 4, 7, 12, 21, 4, 12, 21],
[ 0, 5, 7, 12, 22, 7, 12, 22],
[ 0, 5, 7, 12, 22, 5, 12, 22]])
Upvotes: 2
Reputation: 2026
maybe not the most efficient since it loops on the line, but maybe a good starting point:
import numpy as np
a = np.array([[ 0, 5, 11, 22, 0, 0, 11, 22],
[ 1, 4, 11, 20, 0, 4, 11, 20],
[ 1, 6, 11, 22, 0, 1, 11, 22],
[ 4, 7, 12, 21, 0, 4, 12, 21],
[ 5, 7, 12, 22, 0, 7, 12, 22],
[ 5, 7, 12, 22, 0, 5, 12, 22]])
size = a.shape[1]
for i, line in enumerate(a):
nz = np.nonzero(a[i][:])[0]
z = np.zeros(size - nz.shape[0])
a[i][:] = np.concatenate((z,a[i][:][np.nonzero(a[i][:])]))
For each line in a
, you find the nonzero indices and prepend some zeros to match the size.
Upvotes: 1
Reputation: 363253
Is a loop over rows allowed?
>>> a
array([[ 0, 5, 11, 22, 0, 0, 11, 22],
[ 1, 4, 11, 20, 0, 4, 11, 20],
[ 1, 6, 11, 22, 0, 1, 11, 22],
[ 4, 7, 12, 21, 0, 4, 12, 21],
[ 5, 7, 12, 22, 0, 7, 12, 22],
[ 5, 7, 12, 22, 0, 5, 12, 22]])
>>> for row in a:
... row[:] = np.r_[row[row == 0], row[row != 0]]
...
>>> a
array([[ 0, 0, 0, 5, 11, 22, 11, 22],
[ 0, 1, 4, 11, 20, 4, 11, 20],
[ 0, 1, 6, 11, 22, 1, 11, 22],
[ 0, 4, 7, 12, 21, 4, 12, 21],
[ 0, 5, 7, 12, 22, 7, 12, 22],
[ 0, 5, 7, 12, 22, 5, 12, 22]])
Upvotes: 2