Reputation: 47
The following small programs which compute the sum of all numbers from 1 to 1 billion we're written in C++ and Java as closely as I could write them. My understanding is that C++ is the "faster" language, but the java version of this code completes in ~.5 seconds vs ~3 seconds for C++.
C++ (GCC Compiler):
int main(){
long long x = 0;
for (long i=0;i<1000000001;i++){
x=x+i;
}
cout << x << endl;
return 0;
}
JAVA:
public class Main {
public static void main(String[] args) {
long x=0;
for (long i=0;i<1000000001;i++){
x=x+i;
}
System.out.println(x);
}
}
How would one optimize the C++ code to be as fast as the JAVA version? Is it even possible?
Upvotes: 1
Views: 4427
Reputation: 46392
This question is a perfect example of what not to do. The whole loop is equivalent to a single assignment and any optimizing compiler knows it. So you're measuring how long it takes to start the program and output a line.
Then Java must lose by any factor you wish as running the Java code includes starting the JVM and that's pretty slow. Moreover, it includes the optimizing compilation. What javac did is just the compilation from Java source to Java bytecode and there's no attempt to optimize anything. All the optimizations happen at runtime (bytecode to machine code). 1
So we can conclude that Java is terribly slow for any task taking less than a few seconds. You can get a factor of 20 or infinity (division by zero), if you try hard enough.
The more important conclusion is that it makes no sense. See How do I write a correct micro-benchmark in Java?, if you want a meaningful result.
1 This holds for desktop Java. On Android, it's different.
Upvotes: 8
Reputation: 18793
Compile the C code with the -O
option.
Assembly generated without -O contains lots of memory access (slow):
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], 0
mov QWORD PTR [rbp-16], 0
.L3:
cmp QWORD PTR [rbp-16], 1000000000
jg .L2
mov rax, QWORD PTR [rbp-16]
add QWORD PTR [rbp-8], rax
add QWORD PTR [rbp-16], 1
jmp .L3
.L2:
Assembly generated with -O only uses registers:
main:
mov eax, 1000000001
.L2:
sub rax, 1
jne .L2
See Godbolt's GCC explorer output: https://godbolt.org/g/rx1Va4
EDIT: In the optimized mode, the compiler recognizes that the output is a constant, that's why there is no add instruction. See Nathan's example with output: https://godbolt.org/g/r1PxvL
Upvotes: 3
Reputation: 37045
If you compile with optimizations, then the C++ version is considerably faster.
Java:
javac Main.java
$ time java Main
500000000500000000
real 0m0.727s
user 0m0.724s
sys 0m0.004s
C++:
clang -O3 main.cpp -o cpp
$ time ./cpp
500000000500000000
real 0m0.003s
user 0m0.000s
sys 0m0.000s
My Clang version:
$ clang --version
clang version 4.0.0-1ubuntu1 (tags/RELEASE_400/rc1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
My Java version:
$ javac -version
javac 1.8.0_144
The reason for this is that optimization is a slow process; you get quicker compilation times if you turn optimizations off. This is better for development, so this is the defaults that the Clang developers chose. Java is probably faster because it does more optimizations at run-time. JVM bytecode is not that different to the source-code it compiled from!
Upvotes: 5