Reputation: 3461
I was recently curious to see roughly how many integer increments C++ could handle in a second. To test this, I wrote a short driver program, which is shown below:
#include <iostream>
using namespace std;
int main()
{
int num = 0;
while(++num) {
if(num%100000000 == 0) { // prints num every 100 million iterations
cout << num << endl;
}
}
return 0;
}
When I compiled this code under g++ 7.5.0 with optimization -O3, the program managed to increment approximately 800,000,000 times a second.
But when I switched the type of the int
to a long long
, I found that the performance was severely degraded, to around 100,000,000 times a second.
Can someone explain why this difference occurs?
Upvotes: 0
Views: 517
Reputation: 155604
Signed division is accomplished with the IDIV
instruction. Per Agner Fog's instruction tables, on the Haswell architecture, the reciprocal throughput of IDIV
for 32 bit registers is 8-11, while for 64 bit registers its 24-81. That is, it takes roughly between 2x and 10x longer to do 64 bit integer division when using 64 bit registers than it takes for 32 bit registers. The numbers vary by architecture, and have a wide range even for Haswell specifically, but an 8x loss in performance seems reasonable. It's not the increment (INC
has fixed and absurdly fast speed; it can be dispatched four times per clock cycle apparently), it's your test to limit the amount of output you're performing that uses % 100000000
with larger operand sizes.
Perhaps try replacing it with masking to print based on a large power of 2 instead of a power of 10 (AND
is ridiculously cheap and not tied to register size), e.g:
if((num & ((1 << 27) - 1)) == 0)
If you really like using powers of ten, you could always spring for an upgrade to IceLake; looks like there the difference is only reciprocal throughput of 6 vs. 10, so the performance loss would be less than 2x.
Upvotes: 5