Improving performance when randomizing a std::list

Question

I have a std::list which I am currently randomizing using a Fisher-Yates shuffle (see http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). To summarize, my code carries out the following steps on the list:

Loop through each element of the list.
Swap the element with a randomly chosen element from the current position onwards, including itself.

Because lists don't provide random access, this means that I am iterating over the entire list in step 1, and for each element I'm iterating again, on average over half the remaining elements from that point onwards. This is a major bottleneck in my program's performance, so I'm looking to improve it. For other reasons I need to continue using list as my container, but I'm considering converting to a vector at the start of my randomize function, and then converting back to list at the end. My lists typically contain 300 - 400 items, so I would guess that the cost of conversion between containers will be worth it to avoid traversing the items sequentially.

My question is: does this seem like the best way to go about optimizing the code? Is there a better way?

Igor ostrovsky · Accepted Answer

One easy improvement is to copy the data into a vector, shuffle the vector, and copy it back into a list. That is what was suggested in comments by Max and PeskyGnat:

vector myVector(myList.size());
copy(myList.begin(), myList.end(), myVector.begin());
random_shuffle(myVector.begin(), myVector.end());
list myListShuffled(myVector.begin(), myVector.end());

This implementation is pretty fast. But, it will do three passes over the vector, and you can get it down to two passes by implementing the shuffle yourself:

vector myVector(myList.size());
int lastPos = 0;
for(list::iterator it = myList.begin(); it != myList.end(); it++, lastPos++) {
    int insertPos = rand() % (lastPos + 1);
    if (insertPos < lastPos) {
        myVector[lastPos] = myVector[insertPos]; 
    }

    myVector[insertPos] = *it;
}

list myListShuffled(myVector.begin(), myVector.end());

Since the first version is much easier to understand and much less error-prone, it's almost always preferable... unless perhaps this bit of code is critical for your performance (and you confirmed that with measurement.)

EDIT: By the way, since you are looking at the Wikipedia article, the second code sample uses the "inside-out" variant of Fisher-Yates.

Improving performance when randomizing a std::list

Answers (1)

Related Questions