Reputation: 11
I am using the module random
to shuffle an array of n
elements. I need to do it m
times, and I am not completely sure that the shuffling happening every time is independent.
See an example bellow:
for i in range(10):
a = list(range(1,20))
random.shuffle(a)
print("\n\nSequence of numbers ")
for item in a:
print(item)
Can I be completely sure that the second time I shuffle
list a
will be completely independent from the first time?
Looking at the results I have the impression that the output is not independent. But maybe it is only my impression.
For example, an output that I get for 4 numbers and 4 repetions is the following [1, 3, 2, 4], [1, 3, 2,4], [4, 1, 3, 2] and [1, 4, 3, 2]. Does this happen by chance? Probably yes. But I want to be sure.
Context: it could be that I want to order the n question of an exam I am giving to m students. But I want that this process is done independently for every single student.
Upvotes: 1
Views: 223
Reputation: 13171
This is the actual code from Python's random
module:
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]
looks like a proper Fisher-Yates to me, and totally independent of any previous runs.
Upvotes: 0
Reputation: 15877
For practical purposes, successive calls to random.shuffle are independent. It takes log(N!)/log(2) bits of state to describe a unique ordering of elements, and a quick check of random.getstate() shows that the default pseudorandom number generator actually uses 20000 bits of state. To reach a meaningful overlap we'd need to consume all those bits of entropy.
So we need M*log(N!)/log(2)>=20000 to have a known (but ridiculously hard to predict) correlation. That's not impossible to imagine; it's about 28 questions for 200 students. The odds of that correlation outweighing the fact that they have 304888344611713860501504000000 possible orderings, however, is slim.
Upvotes: 0
Reputation: 88118
You can test this. Note that there are exactly 4!=24 permutations of the numbers 1,2,3,4
. You should expect that in a random sampling each of these permutations come up equally likely. To prove to yourself that this results in the uniform distribution that you are looking for, sample the sequences:
import random, math
from collections import Counter
samples = 1000000
a = list(range(1,5))
C = Counter()
for _ in xrange(samples):
random.shuffle(a)
C[tuple(a)] += 1
import pylab as plt
permutations = math.factorial(4)
expected = float(samples)/permutations
plt.plot(C.values())
plt.plot([0,permutations],[expected,expected],'r--')
plt.ylim(0,expected*2.01)
plt.show()
Note that the red dash is the theoretical expected values and the blue line is the values we get from sampling. From this I am pretty confident that we are getting a uniform distribution, but we could always use a Kolmogorov Smirnov test to quantify it. What this doesn't test for is correlation between the sequences. This again could be tested for using pairs of sequences generated with some time lag, but the Fisher-Yates shuffle used by pythons random.shuffle
does a good job at preventing that.
Upvotes: 1