Which of the two shuffling method works better?

Question

I'm trying to randomly shuffle an collection of integers within a List and I have came up with two shuffling method that does the job. However, I'm not sure which one works better? Does any one have any comments or suggestions?

public class TheCollectionInterface {

    public static  void swap(List list1, int i, int j) {
        E temp = list1.get(i);
        list1.set(i, list1.get(j));
        list1.set(j, temp);
    }

// Alternative version:    
//    public static void shuffling(List list, Random rnd){
//        for(int i = list.size(); i >= 1; i--){
//            swap(list, i - 1, rnd.nextInt(i));
//        }
//    }


    public static  void shuffling(List list1, Random rnd) {
        for (int i = list1.size(); i >= 1; i--){
            swap(list1, i - 1, rnd.nextInt(list1.size()));
        }
    }

    public static void main(String[] args) {

        List li2 = Arrays.asList(1,2,3,4,5,6,7,8,9);

        Random r1 = new Random();

        TheCollectionInterface.shuffling(li2, r1);

        System.out.println(li2);
    }
}

Stuart Marks · Accepted Answer

A shuffling algorithm is generally considered to be good if each possible outcome is equally likely. Your "alternative version" of the shuffling algorithm is known as the Fisher-Yates algorithm. Given a sufficiently good source of randomness, this algorithm provably generates each possible permutation of the input with equal probability. The JDK Collections.shuffle() method uses this algorithm.

The uncommented algorithm is theoretically inferior, and this is easy to demonstrate. The explanation of why this is so is given in this Wikipedia article on the Fisher-Yates shuffle. To quote that article, (edited for clarity)

A common error when implementing the Fisher–Yates shuffle is to pick the random numbers from the wrong range. The flawed algorithm may appear to work correctly, but it will not produce each possible permutation with equal probability. Always selecting j from the entire range of valid array indices on every iteration produces a result which is biased, albeit less obviously so. This can be seen from the fact that doing so yields n^n distinct possible sequences of swaps, whereas there are only n! possible permutations of an n-element array. Since n^n can never be evenly divisible by n! when n > 2 (as the latter is divisible by n−1, which shares no prime factors with n), some permutations must be produced by more of the n^n sequences of swaps than others.

In this context, j is destination slot with which element i - 1 is swapped. It might seem that swapping the current element with the full range of elements between 0 and list.size() would provide a better shuffle, as it seems to do "more" shuffling. Indeed there are more different possible sequences than Fisher-Yates, but the extra sequences add bias, reducing the quality of the shuffle.

The Wikipedia article has more details on why this is so, and it shows the probabilities for each outcome of shuffling a 3-element array.

This is pretty easy to demonstrate. Take a 3-element array or list and shuffle it, say, a million times, and count the frequency of occurrences of each of the possible six permutations. I did this with Fisher-Yates, and the results were all within ±0.3% of each other. With the full range shuffle, three of the permutations occurred about 20% more frequently than the other three.

The Fisher-Yates shuffle algorithm, which you labeled "alternative version," is clearly the superior method.

Which of the two shuffling method works better?

Answers (2)

Related Questions