Reputation: 73
I need to generate an 3xn matrix having random columns ensuring that each column does not contain the same number more than once. I am currently using the below code:
n=10
set = np.arange(0, 10)
matrix = np.random.choice(set, size=3, replace=False)[:, None]
for i in range(n):
column = np.random.choice(set, size=3, replace=False)[:, None]
matrix = np.concatenate((matrix, column),axis=1)
print matrix
which gives the output I expected:
[[2 1 7 2 1 9 7 4 5 2 7]
[4 6 3 5 9 8 1 3 8 4 0]
[3 5 0 0 4 5 4 0 2 5 3]]
However, it seems that the code does not work fast enough. I am aware that implementing the for loop using cython might help, but I want to know that is there any more performant way to write this code solely in python.
Upvotes: 0
Views: 104
Reputation:
You can speed it up further with Python's random module (probably due to this issue):
import random
np.array([random.sample(range(10), 3) for _ in range(n)]).T
n = 10**6
%timeit t = np.array([random.sample(range(10), 3) for _ in range(n)]).T
1 loop, best of 3: 6.25 s per loop
%%timeit
matrix = np.empty((3, n), dtype=np.int)
for i in range(n):
matrix[:, i] = np.random.choice(10, size=3, replace=False)
1 loop, best of 3: 19.3 s per loop
Upvotes: 1
Reputation: 5167
As was already mentioned in the comments, concatenating repeatedly to a numpy
array is a bad idea, as you will have to reallocate memory a lot. As you already know the final size of your result array, you could simply allocate it in the begin and then just iterate over the columns:
matrix = np.empty((3, n), dtype=np.int)
for i in range(n):
matrix[:, i] = np.random.choice(10, size=3, replace=False)
At least on my machine, this is already 6 times faster, than your version.
Upvotes: 0