James Huang
James Huang

Reputation: 876

Shuffle a collection of arrays

I have a bunch of corresponding training data for a model. I'm trying to randomize their orders.

What I thought would work would be this:

rand_inds = np.arange(len(d_bundle))
np.random.shuffle(rand_inds)

for i in [d_bundle, d2_bundle, d_location_bundle, output_bundle]:
    i = i[rand_inds]

However, this doesn't actually modify the stuffs inside of the list. I'd have to do it all manually. If I don't want to do it manually, it seems I could just make another array like c = [d_bundle, d2_bundle, d_location_bundle, output_bundle] and then run through the loop. Then I'd just unpack the bundles into c. However, this would use up more memory than needed right?

Is there a better way?

Upvotes: 1

Views: 57

Answers (2)

Ali_Sh
Ali_Sh

Reputation: 2816

IIUC, You can do this by indexing as:

np.array(ll)[:, rand_inds]

This code will modify all the stuffs, and will make their types the same e.g. if we have float64 and float32, it will convert one of the types to the another one. You can convert the resulted NumPy array to list by add .tolist() at the end of that.

  • Using NumPy arrays and indexing, usually, will consume less memory and is much faster than common loops.

Upvotes: 1

Sarah Messer
Sarah Messer

Reputation: 4023

I got this to work by splitting up the loop:

import numpy as np

# generate test lists. These are your "d_bundle", etc.
l1 = [1, 2, 3, 4, 5]
l2 = [10, 20, 30, 40, 50]
l3 = ['a', 'b', 'c', 'd', 'e']
ll = [l1, l2, l3]  # This is your list of lists

rand_inds = np.arange(len(l1)) # initial, as-is ordering.
np.random.shuffle(rand_inds)   # shuffle rand_inds

for i in range(len(ll)):   # Handle each sublist separately
    l = ll[i]              # Select the sublist to modify
    newl = list(l)         # This reserves memory so we have separate input and output sublists
    for j in range(len(newl)):  # Shuffle the sublist
        newl[j] = l[rand_inds[j]]
    ll[i] = list(newl)     # Put the new sublist in the list of lists

l1, l2, l3 = tuple(ll)     # write the variables back to the original names. Order must match the original `ll` assignment.

I used a lot of list() calls to be very clear about when I wanted to deal with list contents rather than modifying things in-place. This is probably not the most memory-efficient solution, but it should work.

Upvotes: 1

Related Questions