Reputation: 151036

Why does this simple shuffle algorithm produce biased results?

It seems that this simple shuffle algorithm will produce biased results:

# suppose $arr is filled with 1 to 52

for ($i < 0; $i < 52; $i++) { 
  $j = rand(0, 51);

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

You can try it... instead of using 52, use 3 (suppose only 3 cards are used), and run it 10,000 times and tally up the results, you will see that the results are skewed towards certain patterns...

The question is... what is a simple explanation for why it will happen?

The correct solution is to use something like

for ($i < 0; $i < 51; $i++) {  # last card need not swap 
  $j = rand($i, 51);        # don't touch the cards that already "settled"

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

But the question is... why does the first method, seemingly also totally random, make the results biased?

Update 1: thanks for folks here pointing out that it needs to be rand($i, 51) for it to shuffle correctly.

Upvotes: 23

Answers (11)

Robert Cartaino

Reputation: 28112

The best explanation I've seen for this effect was from Jeff Atwood on his CodingHorror blog (The Danger of Naïveté).

Using this code to simulate a 3-card random shuffle...

for (int i = 0; i < cards.Length; i++)
{
    int n = rand.Next(cards.Length);
    Swap(ref cards[i], ref cards[n]);
}

...you get this distribution.

_{(source: typepad.com)}

The shuffle code (above) results in 3^3 (27) possible deck combinations. But the mathematics tell us that there are really only 3! or 6 possible combinations of a 3 card deck. So some of the combinations are over-represented.

You would need to use a Fisher-Yates shuffle to properly (randomly) shuffle a deck of cards.

Upvotes: 15

Hans

Reputation: 1379

The clearest answer to show the first algorithm fails is to view the algorithm in question as a Markov chain of n steps on the graph of n! vertices of all the permutation of n natural numbers. The algorithm hops from one vertex to another with a transition probability. The first algorithm gives the transition probability of 1/n for each hop. There are n^n paths the probability of each of which is 1/n^n. Suppose the final probability of landing on each vertex is 1/n! which is a reduced fraction. To achieve that there must be m paths with the same final vertex such that m/n^n=1/n! or n^n = mn! for some natural number m, or that n^n is divisible by n!. But that is impossible. Otherwise, n has to be divisible by n-1 which is only possible when n=2. We have contradiction.

Upvotes: 0

eKek0

Reputation: 23289

See the Coding Horror post The Danger of Naïveté.

Basically (suposing 3 cards):

The naive shuffle results in 33 (27) possible deck combinations. That's odd, because the mathematics tell us that there are really only 3! or 6 possible combinations of a 3 card deck. In the KFY shuffle, we start with an initial order, swap from the third position with any of the three cards, then swap again from the second position with the remaining two cards.

Upvotes: 2

Joel Coehoorn

Reputation: 415830

See this:
The Danger of Naïveté (Coding Horror)

Let's look at your three card deck as an example. Using a 3 card deck, there are only 6 possible orders for the deck after a shuffle: 123, 132, 213, 231, 312, 321.

With your 1st algorithm there are 27 possible paths (outcomes) for the code, depending on the results of the rand() function at different points. Each of these outcomes are equally likely (unbiased). Each of these outcomes will map to the same single result from the list of 6 possible "real" shuffle results above. We now have 27 items and 6 buckets to put them in. Since 27 is not evenly divisible by 6, some of those 6 combinations must be over-represented.

With the 2nd algorithm there are 6 possible outcomes that map exactly to the 6 possible "real" shuffle results, and they should all be represented equally over time.

This is important because the buckets that are over-represented in the first algorithm are not random. The buckets selected for the bias are repeatable and predictable. So if you're building an online poker game and use the 1st algorithm a hacker could figure out you used the naive sort and from that work out that certain deck arrangements are much more likely to occur than others. Then they can place bets accordingly. They'll lose some, but they'll win much more than they lose and quickly put you out of business.

Upvotes: 39

ely

Reputation: 77424

Not that another answer is needed, but I found it worthwhile to try to work out exactly why Fisher-Yates is uniform.

If we are talking about a deck with N items, then this question is: how can we show that

Pr(Item i ends up in slot j) = 1/N?

Breaking it down with conditional probabilities, Pr(item i ends up at slot j) is equal to

Pr(item i ends up at slot j | item i was not chosen in the first j-1 draws)
* Pr(item i was not chosen in the first j-1 draws).

and from there it expands recursively back to the first draw.

Now, the probability that element i was not drawn on the first draw is N-1 / N. And the probability that it was not drawn on the second draw conditional on the fact that it was not drawn on the first draw is N-2 / N-1 and so on.

So, we get for the probability that element i was not drawn in the first j-1 draws:

(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)

and of course we know that the probability that it is drawn at round j conditional on not having been drawn earlier is just 1 / N-j.

Notice that in the first term, the numerators all cancel the subsequent denominators (i.e. N-1 cancels, N-2 cancels, all the way to N-j+1 cancels, leaving just N-j / N).

So the overall probability of element i appearing in slot j is:

[(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)] * (1 / N-j)
= 1/N

as expected.

To get more general about the "simple shuffle", the particular property that it is lacking is called exchangeability. Because of the "path dependence" of the way the shuffle is created (i.e. which of the 27 paths is followed to create the output), you are not able to treat the different component-wise random variables as though they can appear in any order. In fact, this is perhaps the motivating example for why exchangeability matters in random sampling.

Upvotes: 0

Atilio Jobson

Reputation: 719

The Naive algorithm picks the values of n like so:

n = rand(3)

3^3 possible combinations of n

1,1,1, 1,1,2....3,3,2 3,3,3 (27 combinations) lassevk's answer shows the distribution among the cards of these combinations.

the better algorithm does:

n = rand(3)

n = rand(2)

n! possible combinations of n

1,1, 1,2, 2,1 2,2 3,1 3,2 (6 combinations, all of them giving a different result)

As mentioned in the other answers, if you take 27 attempts to get 6 results, you cannot possibly attain the 6 results with even distribution, since 27 is not divisible by 6. Put 27 marbles into 6 buckets and no matter what you do, some buckets will have more marbles than others, the best you can do is 4,4,4,5,5,5 marbles for buckets 1 through 6.

the fundamental problem with the naive shuffle is that swaps too many times, to shuffle 3 cards completely, you need only do 2 swaps, and the second swap need only be among the first two cards, since the 3rd card already had a 1/3 chance of being swapped. to continue to swap cards will impart more chances that a given card will be swapped, and these chances will only even out to 1/3, 1/3, 1/3 if your total swap combinations is divisible by 6.

Upvotes: 0

nonopolarity

Reputation: 151036

an illustrative approach might be this:

1) consider only 3 cards.

2) for the algorithm to give evenly distributed results, the chance of "1" ending up as a[0] must be 1/3, and the chance of "2" ending up in a[1] must be 1/3 too, and so forth.

3) so if we look at the second algorithm:

probability that "1" ends up at a[0]: when 0 is the random number generated, so 1 case out of (0,1,2), therefore, is 1 out of 3 = 1/3

probability that "2" ends up at a[1]: when it didn't get swapped to a[0] the first time, and it didn't get swapped to a[2] the second time: 2/3 * 1/2 = 1/3

probability that "3" ends up at a[2]: when it didn't get swapped to a[0] the first time, and it didn't get swapped to a[1] the second time: 2/3 * 1/2 = 1/3

they are all perfectly 1/3, and we don't see any error here.

4) if we try to calculate the probability of of "1" ending up as a[0] in the first algorithm, the calculation will be a bit long, but as the illustration in lassevk's answer shows, it is 9/27 = 1/3, but "2" ending up as a[1] has a chance of 8/27, and "3" ending up as a[2] has a chance of 9/27 = 1/3.

as a result, "2" ending up as a[1] is not 1/3 and therefore the algorithm will produce pretty skewed result (about 3.7% error, as opposed to any negligible case such as 3/10000000000000 = 0.00000000003%)

5) the proof that Joel Coehoorn has, actually can prove that some cases will be over-represented. I think the explanation that why it is n^n is this: at each iteration, there are n possibility that the random number can be, so after n iterations, there can be n^n cases = 27. This number doesn't divid the number of permuations (n! = 3! = 6) evenly in the case of n = 3, so some results are over-represented. they are over-represented in a way that instead of showing up 4 times, it shows up 5 times, so if you shuffle the cards millions of times from the initial order of 1 to 52, the over-represented case will show up 5 million times as opposed to 4 million times, which is quite big a difference.

6) i think the over-representation is shown, but "why" will the over-representation happen?

7) an ultimate test for the algorithm to be correct is that any number has a 1/n probability to end up at any slot.

Upvotes: 1

ShreevatsaR

Reputation: 39083

From your comments on the other answers, it seems that you are looking not just for an explanation of why the distribution is not the uniform distribution (for which the divisibility answer is a simple one) but also an "intuitive" explanation of why it is actually far from uniform.

Here's one way of looking at it. Suppose you start with the initial array [1, 2, ..., n] (where n might be 3, or 52, or whatever) and apply one of the two algorithms. If all permutations are uniformly likely, then the probability that 1 remains in the first position should be 1/n. And indeed, in the second (correct) algorithm, it is 1/n, as 1 stays in its place if and only if it is not swapped the first time, i.e. iff the initial call to rand(0,n-1) returns 0.
However, in the first (wrong) algorithm, 1 remains untouched only if it is neither swapped the first time nor any other time — i.e., only if the first rand returns 0 and none of the other rands returns 0, the probability of which is (1/n) * (1-1/n)^(n-1) ≈ 1/(ne) ≈ 0.37/n, not 1/n.

And that's the "intuitive" explanation: in your first algorithm, earlier items are much more likely to be swapped out of place than later items, so the permutations you get are skewed towards patterns in which the early items are not in their original places.

(It's a bit more subtle than that, e.g. 1 can get swapped into a later position and still end up getting swapped back into place through a complicated series of swaps, but those probabilities are relatively less significant.)

Upvotes: 22

user44511

Reputation: 2851

The simple answer is that there are 52^52 possible ways for this algorithm to run, but there are only 52! possible arrangements of 52 cards. For the algorithm to be fair, it needs to produce each of these arrangements equally likely. 52^52 is not an integer multiple of 52!. Therefore, some arrangements must be more likely than others.

Upvotes: 1

dewtell

Reputation: 1432

Here's another intuition: the single shuffle swap can't create symmetry in the probability of occupying a position unless at least 2-way symmetry already exists. Call the three positions A, B, and C. Now let a be the probability of card 2 being in position A, b be the probability of card 2 being in position B, and c be the probability of it being in position C, prior to a swap move. Assume that no two probabilities are the same: a!=b, b!=c, c!=a. Now compute the probabilities a', b', and c' of the card being in these three positions following a swap. Let's say that this swap move consists of position C being swapped with one of the three positions at random. Then:

a' = a*2/3 + c*1/3
b' = b*2/3 + c*1/3
c' = 1/3.

That is, the probability that the card winds up in position A is the probability it was already there times the 2/3 of the time position A isn't involved in the swap, plus the probability that it was in position C times the 1/3 probability that C swapped with A, etc. Now subtracting the first two equations, we get:

a' - b' = (a - b)*2/3

which means that because we assumed a!=b, then a'!=b' (though the difference will approach 0 over time, given enough swaps). But since a'+b'+c'=1, if a'!=b', then neither can be equal to c' either, which is 1/3. So if the three probabilities start off all different before a swap, they will also all be different after a swap. And this would hold no matter which position was swapped - we just interchange the roles of the variables in the above.

Now the very first swap started by swapping card 1 in position A with one of the others. In this case, there was two way symmetry before the swap, because the probability of card 1 in position B = probability of card 1 in position C = 0. So in fact, card 1 can wind up with symmetric probabilities and it does end up in each of the three positions with equal probability. This remains true for all subsequent swaps. But card 2 winds up in the three positions after the first swap with probability (1/3, 2/3, 0), and likewise card 3 winds up in the three positions with probability (1/3, 0, 2/3). So no matter how many subsequent swaps we do, we will never wind up with card 2 or 3 having exactly the same probability of occupying all three positions.

Upvotes: 3

Lasse V. Karlsen

Reputation: 391396

Here's the complete probability tree for these replacements.

Let's assume that you start with the sequence 123, and then we'll enumerate all the various ways to produce random results with the code in question.

123
 +- 123          - swap 1 and 1 (these are positions,
 |   +- 213      - swap 2 and 1  not numbers)
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 123      - swap 2 and 2
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 132      - swap 2 and 3
 |       +- 231  - swap 3 and 1
 |       +- 123  - swap 3 and 2
 |       +- 132  - swap 3 and 3
 +- 213          - swap 1 and 2
 |   +- 123      - swap 2 and 1
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 213      - swap 2 and 2
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 231      - swap 2 and 3
 |       +- 132  - swap 3 and 1
 |       +- 213  - swap 3 and 2
 |       +- 231  - swap 3 and 3
 +- 321          - swap 1 and 3
     +- 231      - swap 2 and 1
     |   +- 132  - swap 3 and 1
     |   +- 213  - swap 3 and 2
     |   +- 231  - swap 3 and 3
     +- 321      - swap 2 and 2
     |   +- 123  - swap 3 and 1
     |   +- 312  - swap 3 and 2
     |   +- 321  - swap 3 and 3
     +- 312      - swap 2 and 3
         +- 213  - swap 3 and 1
         +- 321  - swap 3 and 2
         +- 312  - swap 3 and 3

Now, the fourth column of numbers, the one before the swap information, contains the final outcome, with 27 possible outcomes.

Let's count how many times each pattern occurs:

123 - 4 times
132 - 5 times
213 - 5 times
231 - 5 times
312 - 4 times
321 - 4 times
=============
     27 times total

If you run the code that swaps at random for an infinite number of times, the patterns 132, 213 and 231 will occur more often than the patterns 123, 312, and 321, simply because the way the code swaps makes that more likely to occur.

Now, of course, you can say that if you run the code 30 times (27 + 3), you could end up with all the patterns occuring 5 times, but when dealing with statistics you have to look at the long term trend.

Here's C# code that explores the randomness for one of each possible pattern:

class Program
{
    static void Main(string[] args)
    {
        Dictionary<String, Int32> occurances = new Dictionary<String, Int32>
        {
            { "123", 0 },
            { "132", 0 },
            { "213", 0 },
            { "231", 0 },
            { "312", 0 },
            { "321", 0 }
        };

        Char[] digits = new[] { '1', '2', '3' };
        Func<Char[], Int32, Int32, Char[]> swap = delegate(Char[] input, Int32 pos1, Int32 pos2)
        {
            Char[] result = new Char[] { input[0], input[1], input[2] };
            Char temp = result[pos1];
            result[pos1] = result[pos2];
            result[pos2] = temp;
            return result;
        };

        for (Int32 index1 = 0; index1 < 3; index1++)
        {
            Char[] level1 = swap(digits, 0, index1);
            for (Int32 index2 = 0; index2 < 3; index2++)
            {
                Char[] level2 = swap(level1, 1, index2);
                for (Int32 index3 = 0; index3 < 3; index3++)
                {
                    Char[] level3 = swap(level2, 2, index3);
                    String output = new String(level3);
                    occurances[output]++;
                }
            }
        }

        foreach (var kvp in occurances)
        {
            Console.Out.WriteLine(kvp.Key + ": " + kvp.Value);
        }
    }
}

This outputs:

So while this answer does in fact count, it's not a purely mathematical answer, you just have to evaluate all possible ways the random function can go, and look at the final outputs.

Upvotes: 26

Why does this simple shuffle algorithm produce biased results?

Answers (11)

Related Questions