Fritz FABO
Fritz FABO

Reputation: 97

Transform all elements positionally below 0 into 0 in a matrix (Python)

This is a matrix :

matrix = [[1, 1, 1, 0], 
          [0, 5, 0, 1], 
          [2, 1, 3, 10]]

I want to change all the element positionally below 0 into 0 (on the same column).

The resulting matrix will be :

matrix = [[1, 1, 1, 0], 
          [0, 5, 0, 0], 
          [0, 1, 0, 0]]

I tried this so far. The return is empty

import numpy as np
def transform(matrix):
    newmatrix = np.asarray(matrix)
    i = 0
    j = 0
    for j in range(0,len(matrix[0])-1):
        while i < int(len(matrix))-1 and j < int(len(matrix[0]))-1:
            if newmatrix[i][j] == 0:
                np.put(newmatrix,newmatrix[i+1][j], 0 )
        i +=1
    return print (newmatrix)

Upvotes: 2

Views: 343

Answers (4)

AGN Gazer
AGN Gazer

Reputation: 8378

Method 1 (Original)

import numpy as np
def transform(matrix):
    mat = np.asarray(matrix)
    mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
    # Alternatively:
    # mat[~(mat != 0).cumprod(axis=0, dtype=np.bool)] = 0
    # or,
    # mat[~((mat != 0).cumprod(axis=0, dtype=np.bool))] = 0
    return mat

Then with your sample data, I get the following mat:

In [195]: matrix = [[1, 1, 1, 0], 
     ...:           [0, 5, 0, 1], 
     ...:           [2, 1, 3, 10]]

In [196]: transform(matrix)
Out[196]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

Method 2 (further optimized)

def transform2(matrix):
    mat = np.asarray(matrix)
    mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
    return mat

Method 3 (even more optimized)

def transform3(matrix):
    mat = np.asarray(matrix)
    mat *= mat.cumprod(axis=0, dtype=np.bool)
    return mat

Explanation

Let's look at the main statement (in Method 1):

mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0

We can split it into several "elementary" operations:

  1. Create a boolean mask containing False (numerically 0) where elements of mat are 0 and True (numerically 1) where they are non-zero:

    mask1 = np.not_equal(mat, 0)
    
  2. Using the fact that numerically False is 0, use cumprod() function (a good explanation can be found here: https://www.mathworks.com/help/matlab/ref/cumprod.html)

    mask2 = mask1.cumprod(axis=0)
    

    Since 1*1==1 and 0*0 or 0*1 is 0, all elements of this "mask" will be either 0 or 1. They will be 0 only in locations for which mask1 is zero and below (!) because of the "cumulative nature" of the product along the columns (hence axis=0).

  3. Now, we want set those elements of mat that correspond to 0 in mask2 to 0. To do this we create a boolean mask that is True where mask2 is 0 and False elsewhere. This can be easily achieved by applying logical (or binary) NOT to mask2:

    mask3 = np.logical_not(mask2)
    

    Using "logical" NOT here creates a boolean array thus we avoid explicit type conversion.

  4. Finally we use Boolean Indexing to select those elements of mat that need to be set to 0 and set them to 0:

    mat[mask3] = 0
    

OPTIONAL OPTIMIZATION

If you think of it, we can get rid of steps 3 and 4 if we do the following:

mask2 = mask1.cumprod(axis=0, dtype=np.bool) #convert result to boolean type 
mat *= mask2 # combined step 3&4

See "Method 2" section above for a complete implementation.

PERFORMANCE

There have been several additional answers that use numpy.ufunc.accumulate(). Fundamentally, all these methods revolve around the idea that 0 is a "special" value in the sense that 0*anything==0 or, in the case of @DSM's answer, that False=0<True=0 AND letting numpy perform "cumulative" operation on arrays.

There are some variations in performance but most are minimal except for my method #1 that is slower than other methods.

Here are some timing tests for more functions. NOTE: in order to perform tests correctly, we need to use large arrays. Small array tests will be measuring overhead, cashing, etc.

In [1]: import sys
    ...: import numpy as np
    ...: 

In [2]: print(sys.version)
    ...: 
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

In [3]: print(np.__version__)
    ...: 
1.12.1

In [4]: # Method 1 (Original)
    ...: def transform1(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
    ...:     return mat
    ...: 

In [5]: # Method 2:
    ...: def transform2(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
    ...:     return mat
    ...: 

In [6]: # @DSM method:
    ...: def transform_DSM(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= np.minimum.accumulate(mat != 0)
    ...:     return mat
    ...: 

In [7]: # @DanielF method:
    ...: def transform_DanielF(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
    ...:     return mat
    ...: 

In [8]: # Optimized @DanielF method:
    ...: def transform_DanielF_optimized(matrix):
    ...:     mat = np.asarray(matrix)
    ...:     mat *= np.logical_and.accumulate(mat, dtype=np.bool)
    ...:     return mat
    ...: 

In [9]: matrix = np.random.randint(0, 20000, (20000, 20000))

In [10]: %timeit -n1 transform1(matrix)
22.1 s ± 241 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit -n1 transform2(matrix)
9.29 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [12]: %timeit -n1 transform3(matrix)
9.23 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit -n1 transform_DSM(matrix)
9.24 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit -n1 transform_DanielF(matrix)
10.3 s ± 219 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %timeit -n1 transform_DanielF_optimized(matrix)
9.27 s ± 187 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

My initial solution (Method 1) is slowest while other methods are much faster. @DanielF original method is somewhat slower due to use of boolean indexing (but the optimized variant is as fast as other optimized methods).

Upvotes: 2

Daniel F
Daniel F

Reputation: 14399

A more optimized version of @AGNGazer solution using np.logical_and.accumulate and implicit boolean casting of integers (which doesn't require a lot of multiplication)

def transform(matrix):
    mat = np.asarray(matrix)
    mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
    return mat

transform(m)
Out:
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

Timings:

%timeit transform2(m) # AGN's solution
The slowest run took 44.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.93 µs per loop

%timeit transform(m)
The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.99 µs per loop

m = np.random.randint(0,5,(100,100))

%timeit transform(m)
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.9 µs per loop

%timeit transform2(m)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.4 µs per loop

Looks to be about a 15% speedup.

Upvotes: 1

DSM
DSM

Reputation: 353059

One variant of the cumprod approach is to use a cumulative minimum (or maximum). I prefer this slightly because you can use it to avoid any arithmetic operations beyond comparison, if you wanted, although it's hard to get worked up about it:

In [37]:  m
Out[37]: 
array([[ 1,  1,  1,  0],
       [ 0,  5,  0,  1],
       [ 2,  1,  3, 10]])

In [38]: m * np.minimum.accumulate(m != 0)
Out[38]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

In [39]: np.where(np.minimum.accumulate(m != 0), m, 0)
Out[39]: 
array([[1, 1, 1, 0],
       [0, 5, 0, 0],
       [0, 1, 0, 0]])

Upvotes: 1

jpp
jpp

Reputation: 164663

This is a simple (though not optimised) algorithm:

import numpy as np
from numba import jit

m = np.array([[1, 1, 1, 0], 
              [0, 5, 0, 1], 
              [2, 1, 3, 10]])

@jit(nopython=True)
def zeroer(m):
    a, b = m.shape
    for j in range(b):
        for i in range(a):
            if m[i, j] == 0:
                m[i:, j] = 0
                break
    return m

zeroer(m)

# [[1 1 1 0]
#  [0 5 0 0]
#  [0 1 0 0]]

Upvotes: 1

Related Questions