Reputation: 17617
I need to shuffle one by one all the columns of a numpy matrix. This is my current code
n, p = X.shape
val = []
for i in range(p):
Xt = X.copy()
np.random.shuffle(Xt[:, i])
print(Xt)
I copy each time X
to the variable Xt
. This seems to be very inefficient.
How can I speed this code up?
EDIT: Example Given
`X= [[0 3 6]
[1 4 7]
[2 5 8]]`
The expected output of the for loop is:
>>> [[2 3 6]
[1 4 7]
[0 5 8]]
[[0 5 6]
[1 4 7]
[2 3 8]]
[[0 3 7]
[1 4 8]
[2 5 6]]
>>>
Only one column should be shuffled each time. All the other columns should have the same values of the original matrix
Upvotes: 2
Views: 2829
Reputation: 69182
Shuffling a column in numpy can be done in place and requires no copying at all:
import numpy as np
X = np.arange(25).reshape(5,5).transpose()
print X
np.random.shuffle(X[:,2]) # here, X[:,2] is a just a view onto this column of X
print X
and the output is:
[[ 0 1 2 3 4] # the original
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
[[ 0 1 2 3 4] # note that the middle column is shuffled here
[ 5 6 12 8 9]
[10 11 22 13 14]
[15 16 17 18 19]
[20 21 7 23 24]]
You're doing a lot of copying, and it's hard to tell if any of it is necessary for your overall needs, but it's not required for the shuffle.
Edit:
Although this question is written in terms of shuffling, because shuffling can be done in place, the actual inefficiency is due to copying. The question therefore becomes what does the OP need in terms of copies? Some copying or duplicates of either some additional indices or array values will be required, since the original array needs to be restored. In this case, the only efficiency to be had is hoping that the whole array doesn't need to be copied for each cycle, but only the column (or, basically equivalent, copy the whole matrix once -- compared to copying the matrix p-times as done in the question's example and by @ajcr). The following generator just does this row-by-row:
def sc(x):
p = X.shape[1]
for i in range(p):
hold = np.array(x[:,i])
np.random.shuffle(x[:,i])
yield x
x[:,i] = hold
for i in sc(X):
print i
which gives:
[[ 2 5 11 15 20] # #0 column shuffled
[ 3 6 10 16 21]
[ 0 7 14 17 22]
[ 4 8 13 18 23]
[ 1 9 12 19 24]]
[[ 0 5 11 15 20] # #1 column shuffled
[ 1 8 10 16 21]
[ 2 9 14 17 22]
[ 3 7 13 18 23]
[ 4 6 12 19 24]]
# etc
On the other hand, if the entire array needs a fresh copy for each column shift, that's where the time goes, and it doesn't matter whether the columns are shuffled one-by-one or all at the same time, etc.
Upvotes: 6
Reputation: 176750
Here's one way avoid loops completely and build the required array:
Given an array X
with n
columns, construct an array Y
with n
copies of X
.
Create a mask to select the i-th column from the i-th copy of X
in the array Y
.
Reassign a column-shuffled copy of X
to the relevant indices of Y
using the mask on Y
In NumPy it looks like this:
>>> X = np.arange(9).reshape(3, 3)
>>> X
array([[0, 1, 2], # an example array
[3, 4, 5],
[6, 7, 8]])
>>> Y = X * np.ones((3, 3, 3))
>>> mask = zeros_like(Y)
>>> mask[[0,1,2],:,[0,1,2]] = 1
>>> mask = mask.astype(bool)
>>> Y[mask] = np.random.permutation(X).ravel('F')
>>> Y
array([[[ 6., 1., 2.], # first column shuffled
[ 0., 4., 5.],
[ 3., 7., 8.]],
[[ 0., 7., 2.], # second column shuffled
[ 3., 1., 5.],
[ 6., 4., 8.]],
[[ 0., 1., 8.], # third column shuffled
[ 3., 4., 2.],
[ 6., 7., 5.]]])
Upvotes: 1