Reputation: 9
I think this two program function is almost same but python program runs more than 70 seconds, but c++ runs it really fast. I don't understand what's the difference?
test1.py
import time
x = [(i%10) for i in range(10000)]
y = [(i%10) for i in range(10000)]
start = time.time()
for i in range(10000):
if(i % 100 == 0): print("Current on %d"%i)
for j in range(10000):
r2 = (x[i]-x[j])**2 + (y[i]-y[j])**2
print(time.time()-start)
test2.cpp
#include <iostream>
#include <time.h>
int main(){
float x[10000];
float y[10000];
time_t start, end;
for(int i=0;i<10000;i++){
x[i] = i%10;
y[i] = i%10;
}
start = time(NULL);
for(int i=0;i<10000;i++){
if(i % 100 == 0) std::cout<<"Current on"<<i<<"\n";
for(int j=0;j<10000;j++){
float r2 = (x[i]-x[j])*(x[i]-x[j])+(y[i]-y[j])*(y[i]-y[j]);
}
}
end = time(NULL);
double result = double(end-start);
printf("%f\n",result);
}
Upvotes: 0
Views: 180
Reputation: 982
The other answers have provided some possible reasons, but I wanted to expand upon them.
The very short answer is that regular Python and C++ have different goals and do many things very differently. This means that, even though the programs look similar, they probably have different run-time characteristics.
The short answer is that probably the most significant contributor to the difference is the fact that Python is an interpreted language, while C++ is a compiled language. Therefore, when the C++ program runs, it is already in a format where the instructions contained within it are ready for the processor, so the whole thing runs without interference.
The same is not correct for Python. A Python program is, in fact, something which runs inside another program -- the Python interpreter. What happens is that the interpreter reads your program, figures out the instructions that the processor needs to run to make it happen, and then gets the processor to run those. This can mean that what appears to be a single command in Python can take many processor instructions to carry out, whereas in C++ it may take only one.
The long answer is that it is probably a mixture of points. As C.Nivs indicated, your Python program probably hasn't been written to be the most efficient version it can be. Changing the way it is written has the potential to permit the Python interpreter to change how it decides to go about performing the computations you want. Essentially, writing it a different way might mean that the interpreter can figure out better what you are aiming to achieve, and go about that more directly.
As Quimby said, your C++ program doesn't do anything with the results it computes. This means that (depending on how you run it) the compiler might well end up chopping out parts of the program that will make no difference to the final output. The 'Godbolt' link that Quimby provided appears to suggest that this will happen by default when using the G++ compiler. Therefore, the C++ program is probably performing much fewer computations. It would be worthwhile changing both of your programs to write some sort of output based on the computations at the end. That will both mean that the C++ compiler can't ignore the computations, but also mean that you can confirm that they are actually reaching the same answer.
The above summarises the likely reason for what you see right now. I would still expect C++ to be much faster, even after you make the changes suggested so far. Some reasons for this are listed below:
The C++ program will still have a significant advantage by being compiled ahead-of-time. There's no need for any further translation of your program to the processor's instructions -- that's precisely what the compiler does for you.
Python and C++ represent numbers differently. The way C++ represents them means that your processor can work on them in very few steps. The cost of doing things that way is that C++ can only handle numbers within a particular range. By contrast, Python deliberately represents numbers in a way that makes working on them more complex and slower, but it can handle numbers outside of a given range without the programmer having to change anything.
There is something in modern processors called 'vector instructions' (also frequently known as 'SIMD instructions'). These instructions mean that, instead of working on numbers one-by-one, the processor can work with multiple numbers simultaneously. Depending on precisely what sort of numbers you are working with, this can often mean you get the number crunching running at least 3-7x faster. Vector instructions were made for situations like in your programs where you iterate over one or more arrays and perform numerical computations. I believe that modern C++ compilers are usually able to detect these situations and automatically produce a program that takes advantage of them. In contrast, I don't think Python does anything like that.
I don't think Python's lists are as cache-friendly as C++ arrays. Caches and their effective use is a big topic which I don't want to get into here, but the fundamental idea is that these days, processors usually can carry out their operations faster than the rest of the computer can feed the instructions and data into them from memory. A cache is used to keep a small amount of the instructions and data where the processor can access them relatively quickly. Eventually, data will need to be swapped in and out, but using a cache well means that the processor spends less time waiting for the new data to come in, and spends more time performing useful computations. In certain circumstances, this can have an enormous impact on program speed.
Almost certainly I have forgotten some other reasons, but those are the obvious differences off the top of my head. If you're feeling adventurous, you could look at adjusting your Python program to use something like Numpy. Internally, Numpy lets you work with numbers in a way much closer to how C++ handles them than regular Python, but lets you write your program so that it still looks a lot like standard Python. If you can do that well, you will probably find that the speed difference shrinks a lot.
I hope that this makes sense and is informative :)
Upvotes: 3
Reputation: 19123
There might be a couple of reasons:
Integer
arrays, they are unbounded types, but quite fast, C++ floats will probably be faster.-O1
seems to be enough for gcc, see. With warnings enabled, it even warns that r2
is unused.Upvotes: 6
Reputation: 13106
I'd imagine that the exponential has something to do with it. The exponent in python is surprisingly slow compared to a simple multiplication:
python -m timeit -s "a, b = 100, 3" "(a - b)**2"
1000000 loops, best of 5: 226 nsec per loop
python -m timeit -s "a, b = 100, 3" "(a - b)*(a - b)"
5000000 loops, best of 5: 55.2 nsec per loop
This would be the first major discrepancy between your python and c++.
Also, index-based iteration is slower in Python than direct iteration:
# say we have this:
def one(a, b): # both a and b are assumed to have length 1000
for i in range(1000):
for j in range(1000):
x = a[i] + a[j] + b[i] + b[j]
# versus this
def two(a, b):
for first_a, first_b in zip(a, b):
for sec_a, sec_b in zip(a, b):
x = first_a + first_b + sec_a + sec_b
The timing comes out to:
python -m timeit -s "from otherfile import one, two; a=list(range(1000)); b=list(range(1000,2000))" "one(a, b)"
2 loops, best of 5: 149 msec per loop
python -m timeit -s "from otherfile import one, two; a=list(range(1000)); b=list(range(1000,2000))" "two(a, b)"
5 loops, best of 5: 90.8 msec per loop
So you get a touch over 33% speedup by just directly iterating over your two lists, whereas C++ is able to optimize the indexed loop really well.
So the programs may look similar, but the issue is that the python is not optimal python, so that widens the discrepancy. Writing python to look like another language will often yield sub-optimal python
Upvotes: 3