Beckyyyyy
Beckyyyyy

Reputation: 35

How could I produce conditional matrix in Python?

I am doing a simulation, and I want to build a numpy matrix to represent the status of the simulation.

For example, through simulation I got a matrix A=

matrix( [[0.        , 0.024     , 0.088     , 0.154     , 0.206      ],
         [0.        , 0.3300    , 0.654     , 1         , 0.5        ],
         [0.        , 0.1770    , 0.371     , 0.5149487 , 0.610      ],
         [0.        , 0.        , 0.5       , 0.8       , 0.9        ],
         [0.        , 0.        , 1         , 0.9       , 0.8        ]])

If A[i,j]>=1:
  B[i,j]=1
else:
  B[i,j]=0

If in one row, one element >=1, the following elements in that row all equal to 1. If I want to achieve this without using for loop, how could I do?

The B I want to get is:

matrix([[0, 0, 0, 0, 0],
        [0, 0, 0, 1, 1],
        [0, 0, 0, 0, 0],
        [0, 0, 1, 1, 1]])

Upvotes: 2

Views: 136

Answers (2)

Divakar
Divakar

Reputation: 221614

We can use np.maximum.accumulate after comparing against 1.

Input :

In [79]: a
Out[79]: 
matrix([[0.       , 0.024    , 0.088    , 0.154    , 0.206    ],
        [0.       , 0.33     , 0.654    , 1.       , 0.5      ],
        [0.       , 0.177    , 0.371    , 0.5149487, 0.61     ],
        [0.       , 0.       , 0.5      , 0.8      , 0.9      ],
        [0.       , 0.       , 1.       , 0.9      , 0.8      ]])

Steps leading to solution :

# Compare against 1
In [93]: a>=1
Out[93]: 
matrix([[False, False, False, False, False],
        [False, False, False,  True, False],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False,  True, False, False]])

# Get accumulated max along each row. Thus, it makes sure that once we
# encounter a match(True), it's maintained till the end.
In [91]: np.maximum.accumulate(a>=1,axis=1)
Out[91]: 
matrix([[False, False, False, False, False],
        [False, False, False,  True,  True],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False,  True,  True,  True]])

# View as int8/uint8 dtype. It's meant for memory efficiency to have
# the final output as int dtype
In [92]: np.maximum.accumulate(a>=1,axis=1).view('i1')
Out[92]: 
matrix([[0, 0, 0, 0, 0],
        [0, 0, 0, 1, 1],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 1, 1, 1]], dtype=int8)

Timings (since it seems we care about performance) on large dataset (given sample repeated 10000x along rows and cols) on all proposed solutions -

# Repeated along rows
In [106]: ar = np.repeat(a,10000,axis=0)

In [108]: %timeit (ar >= 1.).cumsum(axis=1, dtype=bool).view('i1')
     ...: %timeit np.maximum.accumulate(ar>=1,axis=1).view('i1')
582 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
593 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Repeated along rows and cols
In [109]: ar = np.repeat(np.repeat(a,1000,axis=0),1000,axis=1)

In [110]: %timeit (ar >= 1.).cumsum(axis=1, dtype=bool).view('i1')
     ...: %timeit np.maximum.accumulate(ar>=1,axis=1).view('i1')
77.9 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
77.3 ms ± 628 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 2

yatu
yatu

Reputation: 88266

You could check which values are greater or equal to 1, and take the cumsum setting axis to 1 and dtype to bool:

(a >= 1.).cumsum(axis=1, dtype=bool).view('i1')

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1]], dtype=int32)

Upvotes: 2

Related Questions