Reputation: 6597
I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T)
will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.
Upvotes: 8
Views: 4160
Reputation: 35109
As of NumPy 1.20.0 released in January 2021 we have a permuted()
method on the new Generator
type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation()
and rng.shuffle()
.
If you want an in-place update you can pass the original array as the out
keyword argument. And you can use the axis
keyword argument to choose the direction along which to shuffle your array.
Upvotes: 4
Reputation: 777
You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.
Upvotes: 1
Reputation: 8608
Building on my comment to @Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X
Upvotes: 1
Reputation: 879561
import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes
returns a view, there is no copying involved and so calling swapaxes
is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b
(independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes
to return an a
-shaped result:
return b.swapaxes(axis, -1)
Upvotes: 5
Reputation: 3532
If you don't want a return
value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation
, in which case replace np.random.shuffle(a[n])
with a[n] = np.random.permutation(a[n])
.
Warning, do not do a[n] = np.random.shuffle(a[n])
. shuffle
does not return
anything, so the row/column you end up "shuffling" will be filled with nan
instead.
Upvotes: 2
Reputation: 3847
Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.
Upvotes: 1