Reputation: 740
I was trying to write a solution for Problem 12 (Project Euler) in Python. The solution was just too slow, so I tried checking up other people's solution on the internet. I found this code written in C++ which does virtually the same exact thing as my python code, with just a few insignificant differences.
Python:
def find_number_of_divisiors(n):
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in range(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
def tri_nums():
n = 1
t = 1
while 1:
yield t
n += 1
t += n
t = tri_nums()
m = 0
for n in t:
d = find_number_of_divisiors(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
C++:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned int iteration = 1;
unsigned int triangle_number = 0;
unsigned int divisor_count = 0;
unsigned int current_max_divisor_count = 0;
while (true) {
triangle_number += iteration;
divisor_count = 0;
for (int x = 2; x <= triangle_number / 2; x ++) {
if (triangle_number % x == 0) {
divisor_count++;
}
}
if (divisor_count > current_max_divisor_count) {
current_max_divisor_count = divisor_count;
std::cout << triangle_number << " has " << divisor_count
<< " divisors." << std::endl;
}
if (divisor_count == 318) {
exit(0);
}
iteration++;
}
return 0;
}
The python code takes 1 minute and 25.83 seconds on my machine to execute. While the C++ code takes around 4.628 seconds. Its like 18x faster. I had expected the C++ code to be faster but not by this great margin and that too just for a simple solution which consists of just 2 loops and a bunch of increments and mods.
Although I would appreciate answers on how to solve this problem, the main question I want to ask is Why is C++ code so much faster? Am I using/doing something wrongly in python?
Replacing range with xrange:
After replacing range with xrange the python code takes around 1 minute 11.48 seconds to execute. (Around 1.2x faster)
Upvotes: 1
Views: 3154
Reputation: 56829
Rewriting the divisor counting algorithm to use divisor function makes the run time reduces to less than 1 second. It is still possible to make it faster, but not really necessary.
This is to show that: before you do any optimization trick with the language features and compiler, you should check whether your algorithm is the bottleneck or not. The trick with compiler/interpreter is indeed quite powerful, as shown in Dougal's answer where the gap between Python and C++ is closed for the equivalent code. However, as you can see, the change in algorithm immediately give a huge performance boost and lower the run time to around the level of algorithmically inefficient C++ code (I didn't test the C++ version, but on my 6-year-old computer, the code below finishes running in ~0.6s).
The code below is written and tested with Python 3.2.3.
import math
def find_number_of_divisiors(n):
if n == 1:
return 1
num = 1
count = 1
div = 2
while (n % div == 0):
n //= div
count += 1
num *= count
div = 3
while (div <= pow(n, 0.5)):
count = 1
while n % div == 0:
n //= div
count += 1
num *= count
div += 2
if n > 1:
num *= 2
return num
Upvotes: 4
Reputation: 28856
This is exactly the kind of code where C++ is going to shine compared to Python: a single fairly tight loop doing arithmetic ops. (I'm going to ignore algorithmic speedups here, because your C++ code uses the same algorithm, and it seems you're explicitly not asking for that...)
C++ compiles this kind of code down to a relatively few number of instructions for the processor (and everything it does probably all fits in the super-fast levels of CPU cache), while Python has a lot of levels of indirection it's going through for each operation. For example, every time you increase a number it's checking that the number didn't just overflow and need to be moved into a bigger data type.
That said, all is not necessarily lost! This is also the kind of code that a just-in-time compiler system like PyPy will do well at, since once it's gone through the loop a few times it compiles the code to something similar to what the C++ code starts at. On my laptop:
$ time python2.7 euler.py >/dev/null
python euler.py 72.23s user 0.10s system 97% cpu 1:13.86 total
$ time pypy euler.py >/dev/null
pypy euler.py > /dev/null 13.21s user 0.03s system 99% cpu 13.251 total
$ clang++ -o euler euler.cpp && time ./euler >/dev/null
./euler > /dev/null 2.71s user 0.00s system 99% cpu 2.717 total
using the version of the Python code with xrange
instead of range
. Optimization levels don't make a difference for me with the C++ code, and neither does using GCC instead of Clang.
While we're at it, this is also a case where Cython can do very well, which compiles almost-Python code to C code that uses the Python APIs, but uses raw C when possible. If we change your code just a little bit by adding some type declarations, and removing the iterator since I don't know how to handle those efficiently in Cython, getting
cdef int find_number_of_divisiors(int n):
cdef int i, div
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in xrange(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
cdef int m, n, t, d
m = 0
n = 1
t = 1
while True:
n += 1
t += n
d = find_number_of_divisiors(t)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
then on my laptop I get
$ time python -c 'import euler_cy' >/dev/null
python -c 'import euler_cy' > /dev/null 4.82s user 0.02s system 98% cpu 4.941 total
(within a factor of 2 of the C++ code).
Upvotes: 8
Reputation: 49043
Here's my own variant built on nhahtdh's factor-counting optimization plus my own prime factorization code:
def prime_factors(x):
def factor_this(x, factor):
factors = []
while x % factor == 0:
x /= factor
factors.append(factor)
return x, factors
x, factors = factor_this(x, 2)
x, f = factor_this(x, 3)
factors += f
i = 5
while i * i <= x:
for j in (2, 4):
x, f = factor_this(x, i)
factors += f
i += j
if x > 1:
factors.append(x)
return factors
def product(series):
from operator import mul
return reduce(mul, series, 1)
def factor_count(n):
from collections import Counter
c = Counter(prime_factors(n))
return product([cc + 1 for cc in c.values()])
def tri_nums():
n, t = 1, 1
while 1:
yield t
n += 1
t += n
if __name__ == '__main__':
m = 0
for n in tri_nums():
d = factor_count(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
break
Upvotes: 1