shuffling randomly 2 numpy arrays synced as efficient as possible

Question

I want to shuffle data that I collected for my neural network.

The numpy arrays look like: np.shape(input)=(50,9) and np.shape(output)=(50,50) I want them obviously to be shuffled synced as I want to train a network with valid data. My goal is that I want to swap element input[2][0:9] with input[32][0:9] and when this happens the column 32 and 2 swap in the output array and row 32 and 2 swap. As I have about 10 million arrays for-looping or creating empty arrays and then filling them is very taxing on compute time. Do you know of a elegant solution to this? It should be randomized

Quang Hoang · Accepted Answer

You can shuffle the first dimension then slice:

x = np.arange(len(input))

# shuffle the index
np.random.shuffle(x)

# reindex
input, output = input[x], output[x]

Another approach is to generate a random list and use argsort:

x = np.argsort(np.random.rand(len(input)))
input, output = input[x], output[x]

shuffling randomly 2 numpy arrays synced as efficient as possible

Answers (1)

Related Questions