efficient way to shuffle one column at the time in numpy matrix

Question

I need to shuffle one by one all the columns of a numpy matrix. This is my current code

n, p = X.shape
val = []
for i in range(p):
    Xt = X.copy()
    np.random.shuffle(Xt[:, i])
    print(Xt)

I copy each time X to the variable Xt. This seems to be very inefficient.

How can I speed this code up?

EDIT: Example Given

`X= [[0 3 6]
    [1 4 7]
    [2 5 8]]`

The expected output of the for loop is:

>>> [[2 3 6]
 [1 4 7]
 [0 5 8]] 

[[0 5 6]
 [1 4 7]
 [2 3 8]] 

[[0 3 7]
 [1 4 8]
 [2 5 6]] 

>>>

Only one column should be shuffled each time. All the other columns should have the same values of the original matrix

tom10 · Accepted Answer

Shuffling a column in numpy can be done in place and requires no copying at all:

import numpy as np
X = np.arange(25).reshape(5,5).transpose()
print X
np.random.shuffle(X[:,2])  # here, X[:,2] is a just a view onto this column of X
print X

and the output is:

[[ 0  1  2  3  4]  # the original
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

[[ 0  1  2  3  4]  # note that the middle column is shuffled here
 [ 5  6 12  8  9]
 [10 11 22 13 14]
 [15 16 17 18 19]
 [20 21  7 23 24]]

You're doing a lot of copying, and it's hard to tell if any of it is necessary for your overall needs, but it's not required for the shuffle.

Edit:
Although this question is written in terms of shuffling, because shuffling can be done in place, the actual inefficiency is due to copying. The question therefore becomes what does the OP need in terms of copies? Some copying or duplicates of either some additional indices or array values will be required, since the original array needs to be restored. In this case, the only efficiency to be had is hoping that the whole array doesn't need to be copied for each cycle, but only the column (or, basically equivalent, copy the whole matrix once -- compared to copying the matrix p-times as done in the question's example and by @ajcr). The following generator just does this row-by-row:

def sc(x):
    p = X.shape[1]
    for i in range(p):
        hold = np.array(x[:,i])
        np.random.shuffle(x[:,i])
        yield x
        x[:,i] = hold

for i in sc(X):
    print i

which gives:

[[ 2  5 11 15 20]    # #0 column shuffled
 [ 3  6 10 16 21]
 [ 0  7 14 17 22]
 [ 4  8 13 18 23]
 [ 1  9 12 19 24]]

[[ 0  5 11 15 20]    # #1 column shuffled
 [ 1  8 10 16 21]
 [ 2  9 14 17 22]
 [ 3  7 13 18 23]
 [ 4  6 12 19 24]]

#  etc

On the other hand, if the entire array needs a fresh copy for each column shift, that's where the time goes, and it doesn't matter whether the columns are shuffled one-by-one or all at the same time, etc.

efficient way to shuffle one column at the time in numpy matrix

Answers (2)

Related Questions