Idov
Idov

Reputation: 5124

x64 performance compared to x86

I wrote this little program in c++ to in order check CPU load scenarios.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>
int main()
{

    double x = 1;
    int t1 = GetTickCount();
    srand(10000);

    for (unsigned long i = 0; i < 10000000; i++)
    {
        int r = rand();
        double l = sqrt((double)r);
        x *= log(l/3) * pow(x, r);
    }

    int t2 = GetTickCount();
    printf("Time: %d\r\n", t2-t1);
    getchar();
}

I compiled it both for x86 and for x64 on win7 x64.
For some reason when I ran the x64 version it finished running in about 3 seconds
but when I tried it with the x86 version it took 48 (!!!) seconds.
I tried it many times and always got similar results.
What could cause this difference?

Upvotes: 4

Views: 5914

Answers (4)

Joe Plante
Joe Plante

Reputation: 19

Part of it is definitely the SSE, but there's a huge reason why x64 uses SSE mode: all AMD64 CPUs are required to have SSE2. Another part could also be the increased register count

Upvotes: 1

user7116
user7116

Reputation: 64068

Looking at the assembler output with /Ox (maximum optimizations), the speed difference between the x86 and x64 build is obvious:

; cl /Ox /Fa tick.cpp
; x86 Line 17: x *= log(l/3) * pow(x, r)
fld     QWORD PTR _x$[esp+32]
mov     eax, esi
test    esi, esi
; ...

We see that x87 instructions are being used for this computation. Compare this to the x64 build:

; cl /Ox /Fa tick.cpp
; x64 Line 17: x *= log(l/3) * pow(x, r)
movapd  xmm1, xmm8
mov     ecx, ebx
movapd  xmm5, xmm0
test    ebx, ebx
; ...

Now we see SSE instructions being used instead.

You can pass /arch:SSE2 to try and massage Visual Studio 2010 to produce similar instructions, but it appears the 64bit compiler simply produces much betterfaster assembly for your task at hand.

Finally, if you relax the floating point model the x86 and x64 perform nearly identically.

Timings, unscientific best of 3:

  • x86, /Ox: 22704 ticks
  • x64, /Ox: 822 ticks
  • x86, /Ox /arch:SSE2: 3432 ticks
  • x64, /Ox /favor:INTEL64: 1014 ticks
  • x86, /Ox /arch:SSE2 /fp:fast: 834 ticks

Upvotes: 14

Andriy
Andriy

Reputation: 8604

The reason is indeed related to SSE. 64-bit release build in VS generates SSE2 instructions by default, but you have to enable it explicitly for 32-bit build using /arch:SSE2 switch. When you do that, you'd get comparable run times for 32 and 64 bit builds.

Upvotes: 5

Jerry Coffin
Jerry Coffin

Reputation: 490128

Many of the possibilities here have little or nothing to do with x86 vs. x64. One obvious possibility is that most (all?) compilers use SSE for floating point, where most normally use 8087-style instructions in x86 mode. Since your code is heavy on floating point, this could make a significant difference.

Another possibility is that in the process or rewriting for x64, they noticed/fixed some problems in their code generator that let it produce substantially better code, at least under certain circumstances.

Though it doesn't look like it applies here, some code also benefits considerably from the increased size and/or number of registers available in 64-bit mode.

Upvotes: 3

Related Questions