Mert Ovn
Mert Ovn

Reputation: 73

Generating 2d numpy arrays from random columns

I need to generate an 3xn matrix having random columns ensuring that each column does not contain the same number more than once. I am currently using the below code:

n=10
set = np.arange(0, 10)
matrix = np.random.choice(set, size=3, replace=False)[:, None]
for i in range(n):
    column = np.random.choice(set, size=3, replace=False)[:, None]
    matrix = np.concatenate((matrix, column),axis=1)
print matrix

which gives the output I expected:

[[2 1 7 2 1 9 7 4 5 2 7]
 [4 6 3 5 9 8 1 3 8 4 0]
 [3 5 0 0 4 5 4 0 2 5 3]]

However, it seems that the code does not work fast enough. I am aware that implementing the for loop using cython might help, but I want to know that is there any more performant way to write this code solely in python.

Upvotes: 0

Views: 104

Answers (2)

user2285236
user2285236

Reputation:

You can speed it up further with Python's random module (probably due to this issue):

import random
np.array([random.sample(range(10), 3) for _ in range(n)]).T

n = 10**6

%timeit t = np.array([random.sample(range(10), 3) for _ in range(n)]).T
1 loop, best of 3: 6.25 s per loop

%%timeit
matrix = np.empty((3, n), dtype=np.int)
for i in range(n):
    matrix[:, i] = np.random.choice(10, size=3, replace=False)
1 loop, best of 3: 19.3 s per loop

Upvotes: 1

jotasi
jotasi

Reputation: 5167

As was already mentioned in the comments, concatenating repeatedly to a numpy array is a bad idea, as you will have to reallocate memory a lot. As you already know the final size of your result array, you could simply allocate it in the begin and then just iterate over the columns:

matrix = np.empty((3, n), dtype=np.int)
for i in range(n):
    matrix[:, i] = np.random.choice(10, size=3, replace=False)

At least on my machine, this is already 6 times faster, than your version.

Upvotes: 0

Related Questions