Rostislav
Rostislav

Reputation: 3977

Why is std::shuffle as slow (or even slower than) std::sort?

Consider the simple code that measures execution time and the number of swaps performed:

#include <iostream>

#include <vector>
#include <random>
#include <chrono>
#include <algorithm>

struct A {
    A(int i = 0) : i(i) {}
    int i;
    static int nSwaps;

    friend void swap(A& l, A& r)
    {
        ++nSwaps;
        std::swap(l.i, r.i);
    }

    bool operator<(const A& r) const
    {
        return i < r.i;
    }
};

int A::nSwaps = 0;

using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;


int main()
{
    std::vector<A> v(10000000);

    std::minstd_rand gen(std::random_device{}());
    std::generate(v.begin(), v.end(), [&gen]() {return gen();});

    auto s = high_resolution_clock::now();
    std::sort(v.begin(), v.end());
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps\n";

    A::nSwaps = 0;
    s = high_resolution_clock::now();
    std::shuffle(v.begin(), v.end(), gen);
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps\n";
}

The output of the program depends on the compiler and the machine, but they are quite similar in their nature. On my laptop with VS2015, I get 1044ms with ~100 million swaps for sort and 824ms with 10 million swaps for shuffle.

libstdc++ and libc++ do twice as few swaps for sort (~50M) and the results are as follows. Rextester gives me similar results: gcc sort 854ms, shuffle 565ms, clang sort 874ms, shuffle 648ms. The results shown by ideone and coliru are even more drastic: ideone sort 1181ms, shuffle 1292ms and coliru sort 1157ms, shuffle 1461ms.

So what's the culprit here? Why with 5 to 10 times more swaps sort is almost as fast or even faster than a simple shuffle? I'm not even taking into account comparisons and more complex logic in std::sort including choosing insertion, heap or quick sort algorithms, etc. I doubt it's the random engine - I've even chosen the simplest one std::minstd_rand which basically does an integer multiplication and a modulo. Is it the cache misses that make shuffle relatively slow?

PS: the behaviour is the same for simple std::vector<int>

Upvotes: 5

Views: 2447

Answers (2)

TemplateRex
TemplateRex

Reputation: 70536

First, std::sort is not required to use an unqualified swap. It's not a customization point, and you cannot rely on your own user-defined swap being found through ADL. But even it would, sort can also use std::rotate, which can do swap but also memmove. This would not be counted by your implementation.

Second, the Standard Library only specifies asymptotic complexity, which is O(N) for std::shuffle and O(N log N) for std::sort. So you should measure for different values of N (e.g. powers of 2 from 65K to 65M amounts of elements) and measure the scaling behavior. For small N, the constant of proportionality of sort could be much smaller than the one for shuffle since it has to call a potentially expensive random generator.

Update: it indeed appears that constant factors and/or cache-effects are the culprit (as pointed out by @stgatilov). See this DEMO where I run std::sort on the data after std::shuffle has been called. Runtime for sort is about half of that of shuffle, with 5x more swaps.

Upvotes: 2

stgatilov
stgatilov

Reputation: 5533

std::random_shuffle usually works as follows:

//random(k) generates uniform random from 0 to k-1 inclusive
for (int i = 1; i < n; i++)
  swap(arr[i], arr[random(i + 1)]);

So we can see two sources of inefficiency here:

  1. Random number generators are often quite slow.
  2. Each swap uses a totally random element from the vector. When the data size is large, the whole vector does not fit into CPU cache, so each such access has to wait until the data is read from RAM.

Speaking of point 2, sorting algorithms like quicksort are much more cache-friendly: most of their memory accesses hit cache.

Upvotes: 6

Related Questions