C++ appears to be significantly slower than both Python Ruby for Project Euler

Question

I have 3 solutions to the following problem from Project Euler.

If p is the perimeter of a right angle triangle with integral length sides, {a,b,c}, there are exactly three solutions for p = 120.

{20,48,52}, {24,45,51}, {30,40,50}

For which value of p ≤ 1000, is the number of solutions maximised?

My three solutions for each language are available below.

C++:

boost::chrono::steady_clock::time_point start_time = boost::chrono::steady_clock::now();
map square_lookup;
for(int i=0; i<= 1500; i++) {
    square_lookup[i*i] = i ;
}
auto end_time = boost::chrono::steady_clock::now();

Python2:

start = time.time()
res = range(1, 1501)
squares = {}
#square_lookups = dict(zip([x*x for x in res], res))
square_lookups = {}
for x in range(1, 1501):
    square_lookups[x*x] = x
end = time.time()

Ruby:

start_time = Time.now

square_lookup = {}
(1 .. 1500).map {|x| x*x}.each_with_index do |square, root|
    square_lookup[square] = root+1
end

end_time = Time.now

Timings on a quadcore i5:

> lookup gen time: 0.00141787528992 
> Python Result: 840 Time:
> 0.282248973846
> 
> Lookup gen time 4640960 nanoseconds 
> C++: Result: 840 : Time: 695301578 nanoseconds
> 
> 
> Lookup gen time 0.000729416
> Ruby: Result: 840 Time: 0.149393345

The lookup gen time is the time it takes to construct a hash table with 1500 elements, with the keys being a perfect square and the values being their respective roots.

Even in this respect, C++ is still SLOWER than both Python and Ruby. I realize I may have the most efficient solution overall and for each language, but using the same types of operations still show C++ as being extremely slow.

IMPORTANT EDIT I changed map to use unordered_map for the C++ solution, however it is still slower!

Modified C++ file: http://pastebin.com/2YyB6Rfm

lookup gen time: 0.00134301185608
Python Result: 840 Time: 0.280808925629

Lookup gen time 2021697 nanoseconds
C++: Result: 840 : Time: 392731891 nanoseconds

Lookup gen time 0.000729313
Ruby: Result: 840 Time: 0.148183345

Jerry Coffin · Accepted Answer

Your code has another serious problem--much more serious than map vs. unordered_map (at least IMO).

In particular, where you do:

int result = square_lookup[(i*i) + (j*j)];

if(result)  {
    int perimeter = i + j + result;
    if(perimeter <= 1000) {
        occurences[perimeter] += 1;
    }
}

This code doesn't just look for the value i*i+j*j in the existing map. Rather, if a key isn't present in the map, it inserts a node in the map with i*i+j*j as the key, and 0 (or, more specifically, a value-initialized object of the map's value_type, which is int in this case) into the map.

Inserting nodes in the map for all those values you don't care about is quite slow. What you're trying to do here is really just check whether the value is already in the map. For that, you can use code like this:

auto result = square_lookup.find(i*i + j*j);

if (result!=square_lookup.end())  {
    int perimeter = i + j + result->second;
    if (perimeter <= 1000) 
        ++occurences[perimeter];                
}

This uses find to find whether a key is in the map. Then if (and only if) the key is in the map, it looks up the value currently associated with that key.

This gives a substantial improvement in speed--to about 20-30 ms with either VC++ or g++.

With this change in place, the difference between map and unordered_map also shrinks away. Code using map can still run in ~20-30 milliseconds. The code using unordered_map may be just a tiny bit faster on average, but my system clock only has 10 ms granularity, so I'd really have to test with more data to say for sure.

For reference, here's the code as I ran it (note that I did some other general cleanup of the code, but nothing else should have any major effect on speed):

#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

int main() {
    auto start_time = chrono::steady_clock::now();
    map square_lookup;
    int ctr = 0;
    generate_n(inserter(square_lookup, square_lookup.end()),
        1500,
        [&]() { ++ctr;  return make_pair(ctr*ctr, ctr); });

    auto end_time = chrono::steady_clock::now();

    cout << "Lookup gen time "
        << chrono::duration_cast(end_time - start_time).count() << "
";

    map occurences;
    typedef std::pair const &map_t;

    for (int i = 0; i <= 1000; i++) {
        for (int j = i; j <= 1000; j++) {
            auto result = square_lookup.find(i*i + j*j);

            if (result != square_lookup.end())  {
                int perimeter = i + j + result->second;
                if (perimeter <= 1000)
                    ++occurences[perimeter];
            }
        }
    }

    auto it = std::max_element(occurences.begin(), occurences.end(), 
        [](map_t a, map_t b) { return a.second < b.second; });

    end_time = chrono::steady_clock::now();
    cout << "C++: Result: " << it->first << " : Time: "
        << chrono::duration_cast(end_time - start_time).count() << "
";
}

Summary: in C++ the [] operator on map will insert an item if it's not already present. That can be handy, but it's not always what you want. If you only want to retrieve a value if it's already present, it's not the right tool for the job--and .find can be substantially faster.

Once you correct that problem, the difference between map and unordered_map (at least mostly) disappears.

C++ appears to be significantly slower than both Python Ruby for Project Euler

Answers (2)

Related Questions