Reputation: 35
I am doing a simulation, and I want to build a numpy matrix to represent the status of the simulation.
For example, through simulation I got a matrix A=
matrix( [[0. , 0.024 , 0.088 , 0.154 , 0.206 ],
[0. , 0.3300 , 0.654 , 1 , 0.5 ],
[0. , 0.1770 , 0.371 , 0.5149487 , 0.610 ],
[0. , 0. , 0.5 , 0.8 , 0.9 ],
[0. , 0. , 1 , 0.9 , 0.8 ]])
If A[i,j]>=1:
B[i,j]=1
else:
B[i,j]=0
If in one row, one element >=1, the following elements in that row all equal to 1.
If I want to achieve this without using for loop, how could I do?
The B I want to get is:
matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 1, 1, 1]])
Upvotes: 2
Views: 136
Reputation: 221614
We can use np.maximum.accumulate
after comparing against 1
.
Input :
In [79]: a
Out[79]:
matrix([[0. , 0.024 , 0.088 , 0.154 , 0.206 ],
[0. , 0.33 , 0.654 , 1. , 0.5 ],
[0. , 0.177 , 0.371 , 0.5149487, 0.61 ],
[0. , 0. , 0.5 , 0.8 , 0.9 ],
[0. , 0. , 1. , 0.9 , 0.8 ]])
Steps leading to solution :
# Compare against 1
In [93]: a>=1
Out[93]:
matrix([[False, False, False, False, False],
[False, False, False, True, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, False, False]])
# Get accumulated max along each row. Thus, it makes sure that once we
# encounter a match(True), it's maintained till the end.
In [91]: np.maximum.accumulate(a>=1,axis=1)
Out[91]:
matrix([[False, False, False, False, False],
[False, False, False, True, True],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, True, True]])
# View as int8/uint8 dtype. It's meant for memory efficiency to have
# the final output as int dtype
In [92]: np.maximum.accumulate(a>=1,axis=1).view('i1')
Out[92]:
matrix([[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 1, 1]], dtype=int8)
Timings (since it seems we care about performance) on large dataset (given sample repeated 10000x
along rows and cols) on all proposed solutions -
# Repeated along rows
In [106]: ar = np.repeat(a,10000,axis=0)
In [108]: %timeit (ar >= 1.).cumsum(axis=1, dtype=bool).view('i1')
...: %timeit np.maximum.accumulate(ar>=1,axis=1).view('i1')
582 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
593 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Repeated along rows and cols
In [109]: ar = np.repeat(np.repeat(a,1000,axis=0),1000,axis=1)
In [110]: %timeit (ar >= 1.).cumsum(axis=1, dtype=bool).view('i1')
...: %timeit np.maximum.accumulate(ar>=1,axis=1).view('i1')
77.9 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
77.3 ms ± 628 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 2