Xavi Reyes
Xavi Reyes

Reputation: 167

Call functions with varying parameters to modify a numpy array efficiently

I want to eliminate the unefficient for loop from this code

import numpy as np

x = np.zeros((5,5))

for i in range(5):
    x[i] = np.random.choice(i+1, 5)

While maintaining the output given

[[0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 2. 2. 1. 0.]
 [1. 2. 3. 1. 0.]
 [1. 0. 3. 3. 1.]]

I have tried this

i = np.arange(5)
x[i] = np.random.choice(i+1, 5)

But it outputs

[[0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]]

Is it possible to remove the loop? If not, which is the most efficient way to proceed for a big array and a lot of repetitions?

Upvotes: 3

Views: 59

Answers (1)

Divakar
Divakar

Reputation: 221524

Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -

def create_rand_limited_per_row(m,n):
    s = np.arange(1,m+1)
    return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]

Sample run -

In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]: 
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 2, 0, 2, 1],
       [0, 0, 1, 3, 0],
       [1, 2, 3, 3, 2]])

To leverage multi-core with numexpr module for large data -

import numexpr as ne

def create_rand_limited_per_row_numepxr(m,n):
    s = np.arange(1,m+1)[:,None]
    a = np.random.randint(0,n,(m,n))
    return ne.evaluate('a%s')

Benchmarking

# Original approach
def create_rand_limited_per_row_loopy(m,n):
    x = np.empty((m,n),dtype=int)
    for i in range(m):
        x[i] = np.random.choice(i+1, n)
    return x

Timings on 1k x 1k data -

In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop

In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop

In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop

Upvotes: 2

Related Questions