Reputation: 903
I have multiple numpy arrays with the same number of rows (axis_0) that I'd like to shuffle in unison. After one shuffle, I'd like to shuffle them again with a different random seed.
Till now, I've used the solution from Better way to shuffle two numpy arrays in unison :
def shuffle_in_unison(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
However, this doesn't work for multiple unison shuffles, since rng_state
is always the same.
RandomState
in order to get a different seed for each call, but this doesn't even work for a single unison shuffle:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
def shuffle_in_unison(a, b):
r = np.random.RandomState() # different state from /dev/urandom for each call
state = r.get_state()
np.random.shuffle(a) # array([4, 2, 1, 5, 3])
np.random.set_state(state)
np.random.shuffle(b) # array([40, 20, 50, 10, 30])
# -> doesn't work
return a,b
for i in xrange(10):
a,b = shuffle_in_unison(a,b)
print a,b
What am I doing wrong?
Edit:
For everyone that doesn't have huge arrays like me, just use the solution by Francesco (https://stackoverflow.com/a/47156309/3955022):
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]
The only drawback is that this is not an in-place operation, which is a pity for large arrays like mine (500G).
Upvotes: 2
Views: 3324
Reputation: 755
I don't normally have to shuffle my data more than once at a time. But this function accommodates any number of input arrays, as well as any number of random shuffles - and it shuffles in-place.
import numpy as np
def shuffle_arrays(arrays, shuffle_quant=1):
assert all(len(arr) == len(arrays[0]) for arr in arrays)
max_int = 2**(32 - 1) - 1
for i in range(shuffle_quant):
seed = np.random.randint(0, max_int)
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c], shuffle_quant=5)
A few things to note:
After the shuffle, the data can be split using np.split
or referenced using slices - depending on the application.
Upvotes: 1
Reputation: 8658
I don't know what are you doing wrong with the way you set the state. However I found an alternative solution: instead of shuffling n
arrays, shuffle their indeces only once with numpy.random.choice
and then reorder all the arrays.
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,5])
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.choice(n_elem, size=n_elem, replace=False)
return a[indeces], b[indeces]
for i in xrange(5):
a, b = shuffle_in_unison(a ,b)
print(a, b)
I get:
[5 2 4 3 1] [50 20 40 30 10]
[1 3 4 2 5] [10 30 40 20 50]
[1 2 5 4 3] [10 20 50 40 30]
[3 2 1 4 5] [30 20 10 40 50]
[1 2 5 3 4] [10 20 50 30 40]
edit
Thanks to @Divakar for the suggestion.
Here is a more readable way to obtain the same result using numpy.random.premutation
def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]
Upvotes: 4
Reputation: 7421
I don't know exactly what you are doing well, but you have not chosen the solution with the most votes on that page or with the second most votes. Try this one:
from sklearn.utils import shuffle
for i in range(10):
X, Y = shuffle(X, Y, random_state=i)
print ("X - ", X, "Y - ", Y)
Output:
X - [3 5 1 4 2] Y - [30 50 10 40 20]
X - [1 5 2 3 4] Y - [10 50 20 30 40]
X - [2 4 5 3 1] Y - [20 40 50 30 10]
X - [3 1 4 2 5] Y - [30 10 40 20 50]
X - [3 2 1 5 4] Y - [30 20 10 50 40]
X - [4 3 2 1 5] Y - [40 30 20 10 50]
X - [1 5 4 3 2] Y - [10 50 40 30 20]
X - [1 3 4 5 2] Y - [10 30 40 50 20]
X - [2 4 3 1 5] Y - [20 40 30 10 50]
X - [1 2 4 3 5] Y - [10 20 40 30 50]
Upvotes: 2