numerica
numerica

Reputation: 13

np.shuffle much slower than np.random.choice

I have an array of shape (N, 3) and I'd like to randomly shuffle the rows. N is on the order of 100,000.

I discovered that np.random.shuffle was bottlenecking my application. I tried replacing the shuffle with a call to np.random.choice and experienced a 10x speed-up. What's going on here? Why is it so much faster to call np.random.choice? Does the np.random.choice version generate a uniformly distributed shuffle?

import timeit

task_choice = '''
N = 100000
x = np.zeros((N, 3))
inds = np.random.choice(N, N, replace=False)
x[np.arange(N), :] = x[inds, :]
'''

task_shuffle = '''
N = 100000
x = np.zeros((N, 3))
np.random.shuffle(x)
'''

task_permute = '''
N = 100000
x = np.zeros((N, 3))
x = np.random.permutation(x)
'''

setup = 'import numpy as np'

timeit.timeit(task_choice, setup=setup, number=10)
>>> 0.11108078400138766

timeit.timeit(task_shuffle, setup=setup, number=10)
>>> 1.0411593900062144

timeit.timeit(task_permute, setup=setup, number=10)
>>> 1.1140159380011028

Edit: For anyone curious, I decided to go with the following solution since it is readable and outperformed all other methods in my benchmarks:

task_ind_permute = '''
N = 100000
x = np.zeros((N, 3))
inds = np.random.permutation(N)
x[np.arange(N), :] = x[inds, :]
'''

Upvotes: 0

Views: 1310

Answers (2)

CT Zhu
CT Zhu

Reputation: 54340

permutation and shuffle are linked, in fact permutation calls shuffle under the hood!!

The reason why shuffle is slower than permutation for multidimensional array is that permutation only need to shuffle the index along the first axis. Thus becomes a special case of shuffle of 1d array (the 1st if-else block).

This special case is also explained in the source as well:

# We trick gcc into providing a specialized implementation for
# the most common case, yielding a ~33% performance improvement.
# Note that apparently, only one branch can ever be specialized.

For shuffle on the otherhand, multidimensional ndarray operation requires a bounce buffer, creating that buffer, especially when the dimension is relative big, becomes expensive. Additionally, we can no longer use the trick mentioned above that helps the 1d case.

With replace=False and using choice to generate a new array of the same size, choice and permutation is the same, see here. The extra time would have to come from the time spend in creating intermediate index arrays.

Upvotes: 1

user3483203
user3483203

Reputation: 51155

You're comparing very different sized arrays here. In your first example, although you create an array of zeros, you simply use random.choice(100000, 100000), which pulls 100000 random values between 1-100000. In your second example your are shuffling a (100000, 3) shape array.

>>> x.shape
(100000, 3)
>>> np.random.choice(N, N, replace=False).shape
(100000,)

Timings on more equivalent samples:

In [979]: %timeit np.random.choice(N, N, replace=False)
2.6 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [980]: x = np.arange(100000)

In [981]: %timeit np.random.shuffle(x)
2.29 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [982]: x.shape == np.random.choice(N, N, replace=False).shape
Out[982]: True

Upvotes: 2

Related Questions