Reputation: 14754
I was trying to measure the speed difference of single precision division vs double precision division in C++
Here is the simple code that I have written.
#include <iostream>
#include <time.h>
int main(int argc, char *argv[])
{
float f_x = 45672.0;
float f_y = 67783.0;
double d_x = 45672.0;
double d_y = 67783.0;
float f_answer;
double d_answer;
clock_t start,stop;
int N = 200000000 //2*10^8
start = clock();
for (int i = 0; i < N; ++i)
{
f_answer = f_x/f_y;
}
stop = clock();
std::cout<<"Single Precision:"<< (stop-start)/(double)CLOCKS_PER_SEC<<" "<<f_answer <<std::endl;
start = clock();
for (int i = 0; i < N; ++i)
{
d_answer = d_x/d_y;
}
stop = clock();
std::cout<<"Double precision:" <<(stop-start)/(double)CLOCKS_PER_SEC<<" "<< d_answer<<std::endl;
return 0;
}
When I compiled the code without optimization as g++ test.cpp
I got the following output
Desktop: ./a.out
Single precision:8.06 0.673797
Double precision:12.68 0.673797
But if I compile this with g++ -O3 test.cpp
then I get
Desktop: ./a.out
Single precision:0 0.673797
Double precision:0 0.673797
How did I get such a drastic performance increase? The time being shown in the second case is 0 because of the low resolution of the clock()
function. Did the compiler somehow detect that each for loop iteration is independent of the previous iterations?
Upvotes: 2
Views: 359
Reputation: 56088
Looking at the assembly that you get from g++ -O3 -S
, it's quite apparent the loops and all of your floating point calculations (aside from those involving the time) were optimized out of existence:
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB970:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushq %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
subq $24, %rsp
.cfi_def_cfa_offset 48
call clock
movq %rax, %rbx
call clock
movq %rax, %rbp
movl $.LC0, %esi
movl std::cout, %edi
subq %rbx, %rbp
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
See the two calls to clock
, one right after the other? And before those, only some stack maintenance instructions. Yep, those loops are completely gone.
You only use f_answer
or d_answer
to print out an answer that can be trivially calculated at compile time, and the compiler can see that. There's no point in even having them. And if there's no point in having them, there's no point in having f_x
, f_y
, d_x
, or d_y
either. All gone.
To solve this, you need to have each iteration of the loop depend on the results from the last iteration. Here is my solution to this problem. I use the complex
template to do some calculations involved in calculating the Mandlebrot set:
#include <iostream>
#include <time.h>
#include <complex>
int main(int argc, char *argv[])
{
using ::std::complex;
using ::std::cout;
const complex<float> f_coord(0.1, 0.1);
const complex<double> d_coord(0.1, 0.1);
complex<float> f_answer(0, 0);
complex<double> d_answer(0, 0);
clock_t start, stop;
const unsigned int N = 200000000; //2*10^8
start = clock();
for (unsigned int i = 0; i < N; ++i)
{
f_answer = (f_answer * f_answer) + f_coord;
}
stop = clock();
cout << "Single Precision: " << (stop-start)/(double)CLOCKS_PER_SEC
<< " " << f_answer << '\n';
start = clock();
for (unsigned int i = 0; i < N; ++i)
{
d_answer = (d_answer * d_answer) + d_coord;
}
stop = clock();
cout << "Double precision: " <<(stop-start)/(double)CLOCKS_PER_SEC
<< " " << d_answer << '\n';
return 0;
}
Upvotes: 5
Reputation: 62106
If you add the volatile
qualifier in the definitions of your floats and doubles, the compiler won't optimize away the unused calculations.
Upvotes: 1
Reputation: 272752
Probably because the compiler optimised the loop away to a single iteration. It may even have done the division at compile-time.
Check the assembler of your executable to be sure (use e.g. objdump).
Upvotes: 7