Reputation: 97
This is a matrix :
matrix = [[1, 1, 1, 0],
[0, 5, 0, 1],
[2, 1, 3, 10]]
I want to change all the element positionally below 0 into 0 (on the same column).
The resulting matrix will be :
matrix = [[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]]
I tried this so far. The return is empty
import numpy as np
def transform(matrix):
newmatrix = np.asarray(matrix)
i = 0
j = 0
for j in range(0,len(matrix[0])-1):
while i < int(len(matrix))-1 and j < int(len(matrix[0]))-1:
if newmatrix[i][j] == 0:
np.put(newmatrix,newmatrix[i+1][j], 0 )
i +=1
return print (newmatrix)
Upvotes: 2
Views: 343
Reputation: 8378
import numpy as np
def transform(matrix):
mat = np.asarray(matrix)
mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
# Alternatively:
# mat[~(mat != 0).cumprod(axis=0, dtype=np.bool)] = 0
# or,
# mat[~((mat != 0).cumprod(axis=0, dtype=np.bool))] = 0
return mat
Then with your sample data, I get the following mat
:
In [195]: matrix = [[1, 1, 1, 0],
...: [0, 5, 0, 1],
...: [2, 1, 3, 10]]
In [196]: transform(matrix)
Out[196]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
def transform2(matrix):
mat = np.asarray(matrix)
mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
return mat
def transform3(matrix):
mat = np.asarray(matrix)
mat *= mat.cumprod(axis=0, dtype=np.bool)
return mat
Let's look at the main statement (in Method 1):
mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
We can split it into several "elementary" operations:
Create a boolean mask containing False
(numerically 0
) where elements of mat
are 0
and True
(numerically 1
) where they are non-zero:
mask1 = np.not_equal(mat, 0)
Using the fact that numerically False
is 0, use cumprod()
function (a good explanation can be found here: https://www.mathworks.com/help/matlab/ref/cumprod.html)
mask2 = mask1.cumprod(axis=0)
Since 1*1==1
and 0*0
or 0*1
is 0
, all elements of this "mask" will be either 0
or 1
. They will be 0
only in locations for which mask1
is zero and below (!) because of the "cumulative nature" of the product along the columns (hence axis=0
).
Now, we want set those elements of mat
that correspond to 0
in mask2
to 0
. To do this we create a boolean mask that is True
where mask2
is 0
and False
elsewhere. This can be easily achieved by applying logical (or binary) NOT to mask2
:
mask3 = np.logical_not(mask2)
Using "logical" NOT here creates a boolean array thus we avoid explicit type conversion.
Finally we use Boolean Indexing to select those elements of mat
that need to be set to 0
and set them to 0
:
mat[mask3] = 0
If you think of it, we can get rid of steps 3 and 4 if we do the following:
mask2 = mask1.cumprod(axis=0, dtype=np.bool) #convert result to boolean type
mat *= mask2 # combined step 3&4
See "Method 2" section above for a complete implementation.
There have been several additional answers that use numpy.ufunc.accumulate()
. Fundamentally, all these methods revolve around the idea that 0
is a "special" value in the sense that 0*anything==0
or, in the case of @DSM's answer, that False=0<True=0
AND letting numpy
perform "cumulative" operation on arrays.
There are some variations in performance but most are minimal except for my method #1 that is slower than other methods.
Here are some timing tests for more functions. NOTE: in order to perform tests correctly, we need to use large arrays. Small array tests will be measuring overhead, cashing, etc.
In [1]: import sys
...: import numpy as np
...:
In [2]: print(sys.version)
...:
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
In [3]: print(np.__version__)
...:
1.12.1
In [4]: # Method 1 (Original)
...: def transform1(matrix):
...: mat = np.asarray(matrix)
...: mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
...: return mat
...:
In [5]: # Method 2:
...: def transform2(matrix):
...: mat = np.asarray(matrix)
...: mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
...: return mat
...:
In [6]: # @DSM method:
...: def transform_DSM(matrix):
...: mat = np.asarray(matrix)
...: mat *= np.minimum.accumulate(mat != 0)
...: return mat
...:
In [7]: # @DanielF method:
...: def transform_DanielF(matrix):
...: mat = np.asarray(matrix)
...: mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
...: return mat
...:
In [8]: # Optimized @DanielF method:
...: def transform_DanielF_optimized(matrix):
...: mat = np.asarray(matrix)
...: mat *= np.logical_and.accumulate(mat, dtype=np.bool)
...: return mat
...:
In [9]: matrix = np.random.randint(0, 20000, (20000, 20000))
In [10]: %timeit -n1 transform1(matrix)
22.1 s ± 241 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [11]: %timeit -n1 transform2(matrix)
9.29 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [12]: %timeit -n1 transform3(matrix)
9.23 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [13]: %timeit -n1 transform_DSM(matrix)
9.24 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: %timeit -n1 transform_DanielF(matrix)
10.3 s ± 219 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [15]: %timeit -n1 transform_DanielF_optimized(matrix)
9.27 s ± 187 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
My initial solution (Method 1) is slowest while other methods are much faster. @DanielF original method is somewhat slower due to use of boolean indexing (but the optimized variant is as fast as other optimized methods).
Upvotes: 2
Reputation: 14399
A more optimized version of @AGNGazer solution using np.logical_and.accumulate
and implicit boolean casting of integers (which doesn't require a lot of multiplication)
def transform(matrix):
mat = np.asarray(matrix)
mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
return mat
transform(m)
Out:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
Timings:
%timeit transform2(m) # AGN's solution
The slowest run took 44.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.93 µs per loop
%timeit transform(m)
The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.99 µs per loop
m = np.random.randint(0,5,(100,100))
%timeit transform(m)
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.9 µs per loop
%timeit transform2(m)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.4 µs per loop
Looks to be about a 15% speedup.
Upvotes: 1
Reputation: 353059
One variant of the cumprod
approach is to use a cumulative minimum (or maximum). I prefer this slightly because you can use it to avoid any arithmetic operations beyond comparison, if you wanted, although it's hard to get worked up about it:
In [37]: m
Out[37]:
array([[ 1, 1, 1, 0],
[ 0, 5, 0, 1],
[ 2, 1, 3, 10]])
In [38]: m * np.minimum.accumulate(m != 0)
Out[38]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
In [39]: np.where(np.minimum.accumulate(m != 0), m, 0)
Out[39]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
Upvotes: 1
Reputation: 164663
This is a simple (though not optimised) algorithm:
import numpy as np
from numba import jit
m = np.array([[1, 1, 1, 0],
[0, 5, 0, 1],
[2, 1, 3, 10]])
@jit(nopython=True)
def zeroer(m):
a, b = m.shape
for j in range(b):
for i in range(a):
if m[i, j] == 0:
m[i:, j] = 0
break
return m
zeroer(m)
# [[1 1 1 0]
# [0 5 0 0]
# [0 1 0 0]]
Upvotes: 1