Reputation: 3204
I have a rectangular 2D array on which I want to apply a 2D indexing array (e.g. arr[indexing_array]
).
import numpy as np
import pandas as pd
np.random.seed(1234)
arr = np.random.rand(4,9)
[[0.19 0.62 0.44 0.79 0.78 0.27 0.28 0.8 0.96]
[0.88 0.36 0.5 0.68 0.71 0.37 0.56 0.5 0.01]
[0.77 0.88 0.36 0.62 0.08 0.37 0.93 0.65 0.4 ]
[0.79 0.32 0.57 0.87 0.44 0.8 0.14 0.7 0.7 ]]
I want the 2D indexing array to be a repeated lower triangular, something similar to this for the array arr
:
[[False False False False False False False False False]
[ True False False True False False True False False]
[ True True False True True False True True False]
[ True True True True True True True True True]]
Right now I'm creating this index with the following command:
nb_rep = 3 # The number of times the lower triangular array is repeated
k = 0 # An offset for the diagonal
np.arange(arr.shape[0])[:, None] + k > np.tile(np.arange(arr.shape[1]-6), nb_rep)
I tried a solution with np.tril
and np.tril_indices
functions, but it was quite slower than this solution. Is there a way to simplify this (I'm really not sure about my implementation on the right side of the >)? I used np.tile
, but from what I found it might not be the fastest for replicating arrays.
Upvotes: 0
Views: 68
Reputation: 2002
I don't know if my method is the most efficient but it seems to run faster than your code.
Your code:
import numpy as np
import pandas as pd
np.random.seed(1234)
arr = np.random.rand(4,9)
nb_rep = 3 # The number of times the lower triangular array is repeated
k = 0 # An offset for the diagonal
%timeit np.arange(arr.shape[0])[:, None] + k > np.tile(np.arange((arr.shape[1]-6)), nb_rep)
output:
The slowest run took 10.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 12.9 µs per loop
My method:
%timeit np.arange(arr.shape[0])[:, None] + k > ((np.arange(6*nb_rep) % arr.shape[0])[None, :])
output:
The slowest run took 15.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 6.02 µs per loop
using a much larger array the size of (4000, 9000), the difference is even more significant.
Output of testing your code:
100 loops, best of 5: 46.8 ms per loop
Output of testing my code:
100 loops, best of 5: 133 µs per loop
Upvotes: 1