32 Bit vs 64 Bit: Massive Runtime Difference

Question

I am considering the following C++ program:

#include 
#include 


int main(int argc, char **argv) {
   unsigned int sum = 0;
   for (unsigned int i = 1; i < std::numeric_limits::max(); ++i) {
      double f = static_cast(i);
      unsigned int t = static_cast(f); 
      sum += (t % 2);
   }
   std::cout << sum << std::endl;
   return 0; 
}

I use the gcc / g++ compiler, g++ -v gives gcc version 4.7.2 20130108 [gcc-4_7-branch revision 195012] (SUSE Linux). I am running openSUSE 12.3 (x86_64) and have a Intel(R) Core(TM) i7-3520M CPU.

Running

g++ -O3 test.C -o test_64_opt
g++ -O0 test.C -o test_64_no_opt
g++ -m32 -O3 test.C -o test_32_opt
g++ -m32 -O0 test.C -o test_32_no_opt

time ./test_64_opt
time ./test_64_no_opt
time ./test_32_opt
time ./test_32_no_opt

yields

2147483647

real    0m4.920s
user    0m4.904s
sys     0m0.001s

2147483647

real    0m16.918s
user    0m16.851s
sys     0m0.019s

2147483647

real    0m37.422s
user    0m37.308s
sys     0m0.000s

2147483647

real    0m57.973s
user    0m57.790s
sys     0m0.011s

Using float instead of double, the optimized 64 bit variant even finishes in 2.4 seconds, while the other running times stay roughly the same. However, with float I get different outputs depending on optimization, probably due to the higher processor-internal precision.

I know 64 bit may have faster math, but we have a factor of 7 (and nearly 15 with floats) here.

I would appreciate an explanation of these running time discrepancies.

us2012 · Accepted Answer

The problem isn't 32bit vs 64bit, it's the lack of SSE and SSE2. When compiling for 64bit, gcc assumes it can use SSE and SSE2 since all available x86_64 processors have it.

Compile your 32bit version with -msse -msse2 and the runtime difference nearly disappears.

My benchmark results for completeness:

-O3 -m32 -msse -msse2     4.678s
-O3 (64bit)               4.524s

32 Bit vs 64 Bit: Massive Runtime Difference

Answers (1)

Related Questions