Reputation: 5124
I wrote this little program in c++ to in order check CPU load scenarios.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>
int main()
{
double x = 1;
int t1 = GetTickCount();
srand(10000);
for (unsigned long i = 0; i < 10000000; i++)
{
int r = rand();
double l = sqrt((double)r);
x *= log(l/3) * pow(x, r);
}
int t2 = GetTickCount();
printf("Time: %d\r\n", t2-t1);
getchar();
}
I compiled it both for x86 and for x64 on win7 x64.
For some reason when I ran the x64 version it finished running in about 3 seconds
but when I tried it with the x86 version it took 48 (!!!) seconds.
I tried it many times and always got similar results.
What could cause this difference?
Upvotes: 4
Views: 5914
Reputation: 19
Part of it is definitely the SSE, but there's a huge reason why x64 uses SSE mode: all AMD64 CPUs are required to have SSE2. Another part could also be the increased register count
Upvotes: 1
Reputation: 64068
Looking at the assembler output with /Ox
(maximum optimizations), the speed difference between the x86 and x64 build is obvious:
; cl /Ox /Fa tick.cpp
; x86 Line 17: x *= log(l/3) * pow(x, r)
fld QWORD PTR _x$[esp+32]
mov eax, esi
test esi, esi
; ...
We see that x87 instructions are being used for this computation. Compare this to the x64 build:
; cl /Ox /Fa tick.cpp
; x64 Line 17: x *= log(l/3) * pow(x, r)
movapd xmm1, xmm8
mov ecx, ebx
movapd xmm5, xmm0
test ebx, ebx
; ...
Now we see SSE instructions being used instead.
You can pass /arch:SSE2
to try and massage Visual Studio 2010 to produce similar instructions, but it appears the 64bit compiler simply produces much betterfaster assembly for your task at hand.
Finally, if you relax the floating point model the x86 and x64 perform nearly identically.
Timings, unscientific best of 3:
/Ox
: 22704 ticks/Ox
: 822 ticks/Ox /arch:SSE2
: 3432 ticks/Ox /favor:INTEL64
: 1014 ticks/Ox /arch:SSE2 /fp:fast
: 834 ticksUpvotes: 14
Reputation: 8604
The reason is indeed related to SSE. 64-bit release build in VS generates SSE2 instructions by default, but you have to enable it explicitly for 32-bit build using /arch:SSE2
switch. When you do that, you'd get comparable run times for 32 and 64 bit builds.
Upvotes: 5
Reputation: 490128
Many of the possibilities here have little or nothing to do with x86 vs. x64. One obvious possibility is that most (all?) compilers use SSE for floating point, where most normally use 8087-style instructions in x86 mode. Since your code is heavy on floating point, this could make a significant difference.
Another possibility is that in the process or rewriting for x64, they noticed/fixed some problems in their code generator that let it produce substantially better code, at least under certain circumstances.
Though it doesn't look like it applies here, some code also benefits considerably from the increased size and/or number of registers available in 64-bit mode.
Upvotes: 3