Reputation:
What is the quickest way to find the maximum of two floats:
a)
y = std::max(x1, x2);
b)
if (x1 > x2)
y = x1;
else
y = x2;
c)
y = x1 > x2 ? x1 : x2;
Thanks
Upvotes: 3
Views: 9730
Reputation: 27250
You can check it yourself on your system.
I did it for you on gcc - redhat. The results on my system, for 100,000 executions with x1 = 432943.5 and x2 = 434232.9
a) ~1200 uSec b) ~600 uSec c) ~600 uSec
EDIT: With -O2 optimization I got the same results in all 3 cases: ~110 uSec.
Of course, the actual result depends on many factors in your specific problem and system.
Upvotes: 5
Reputation: 13814
Intel x86 has instructions (FCOMI/FCOMIP/ FUCOMI/FUCOMIP) that provide rapid comparison of floating point values. Your CPU may have such instructions, too. The trick is figuring out what C++ to write in order to maximize the chances of your compiler using those instructions instead of doing something slower but more generic.
The optimistic suggestion is to use std::max(float, float) in the hopes that someone else ignored those who taunt "microbenchmark" and "premature optimization" and has done the research necessary to provide a specialization for std::max(float, float) that will use the your hardware's specialized instructions.
Upvotes: 3
Reputation: 754715
Here's a different question
Why do you think such a small optimization will matter in the greater context of your program?
I find it highly unlikely that a micro-optimization such as this will have an perceivable impact on your program. You should never micro-optimize like this unless a profiler has specifically shown this to be a problem.
EDIT Adding some clarification buried in the comments
The reason there is no great answer to this question is that the performance of that code is highly dependent upon ...
Even if all of this information was included, our answers would be guesses at best. The only way to answer this question is to whip out a profiler and find out which is faster.
However this is almost certainly not worth the effort. Micro-optimizing such a small piece of your program will almost certainly not add any noticable preformance benefits to your code. In general it's a really bad idea to optimize code like this unless the profiler specifically tells you it's a problem. Otherwise you'll spend a lot of time optimizing something for no percievable benefit.
Yes there are cases where such an optimization could be important. But that would be only in a very special circumstances where the code was a part of a very tight highly called loop. However the only way to identify such code is to use a profiler.
Upvotes: 24
Reputation: 400274
First, as others have said, profile your code and make absolute sure that this is something worth optimizing. If it is, read on: you can do it without branching. See Down with fcmp: Conditional Moves For Branchless Math for more details.
Upvotes: 2
Reputation:
-O3 Dual core Macbook pro 2.4ghz
std::max(x1, x2) Time: 4.19488 RMAAx's : 4.19613 if Time: 4.18775 ? Time: 4.18831
std::max(x1, x2) Time: 4.1836 RMAAx's : 4.18274 if Time: 4.18603 ? Time: 4.18857
std::max(x1, x2) Time: 4.18714 RMAAx's : 4.18759 if Time: 4.19752 ? Time: 4.19797
std::max(x1, x2) Time: 4.1926 RMAAx's : 4.19293 if Time: 4.19334 ? Time: 4.19626
std::max(x1, x2) Time: 4.18963 RMAAx's : 4.19628 if Time: 4.19253 ? Time: 4.19107
#include <iostream>
using namespace std;
int main (int argc, char * const argv[]) {
uint64_t iterations = 10000000000;
float x1 = 3455.232;
float x2 = 7456.856;
float y = 0;
for (int count = 0; count < 5; ++count)
{
clock_t begin_time = clock();
for (uint64_t ii = 0; ii < iterations; ++ii)
{
y = std::max(x1, x2);
}
std::cout << "std::max(x1, x2) Time: " << float( clock () - begin_time ) / CLOCKS_PER_SEC << endl;
begin_time = clock();
for (uint64_t ii = 0; ii < iterations; ++ii)
{
y = x1;
if (y < x2)
y = x2;
}
std::cout << "RMAAx's : " << float( clock () - begin_time ) / CLOCKS_PER_SEC << endl;
begin_time = clock();
for (uint64_t ii = 0; ii < iterations; ++ii)
{
if (x1 > x2)
y = x1;
else
y = x2;
}
std::cout << "if Time: " << float( clock () - begin_time ) / CLOCKS_PER_SEC << endl;
begin_time = clock();
for (uint64_t ii = 0; ii < iterations; ++ii)
{
y = x1 > x2 ? x1 : x2;
}
std::cout << "? Time: " << float( clock () - begin_time ) / CLOCKS_PER_SEC << endl;
}
return 0;
}
Upvotes: 4
Reputation: 45193
It is common to #define
either b or c as a macro, and use that throughout your code. Note that this only works in most cases, and will die if you pass in an x or y argument modified by a non-idempotent operator or function call with side effects. For example:
#define MAX(x,y) (((x) < (y)) ? (y) : (x))
...
MAX(i++, ++j); //won't work properly, the ++'s will get executed twice.
MAX(changeKSomehow(k), changeLSomehow(L)); //won't work, the functions will get called twice.
It turns out that std::max, at least for GNU libstdc++, is implemented nearly identically, and uses the inline
hint. The compiler should be able to take the hint when appropriate (when the < operator takes few enough instructions not to cause a huge amount of I$ pressure if inlined).
Upvotes: 0
Reputation: 304
And this way may be better?
y = x1;
if (y < x2)
y = x2;
Removing the else condition may be better interpreted by the compiler.
Edit1: If benchmarking don't forget to do half the test with x1 greater then x2 and the other half with x2 bigger. Otherwise the results would not reflect real cases.
Edit2: Microoptimazions are somehow usefull if you work in embedded systems with 1 or 2k of memory. And also, its an intersting problem to think why the timings are different in each case.
Upvotes: 2
Reputation: 27382
The only way to really know is to measure them. They might vary from compiler to compiler or platform to platform.
Run a loop of each one for 100,000 or 500,000 iterations and compare the total execution times.
Upvotes: 2
Reputation: 14341
Same observation as usual, when it comes to "quickest". Did you measure the execution time of you maximum calculation, reported to the rest of execution time of the process? Does this "optimization" decision have a significant impact on the execution time of your application?
I'm 99% sure that the difference between your proposals doesn't worth considering.
Upvotes: 2
Reputation: 14392
It is compiler specific, but I would suspect that the result is the same. Have a look at the assembly output of the compiler.
Upvotes: 4
Reputation: 190945
B and C would compile the same at least in theory. I pick those because unless std::max
is not a function call (e.g. a macro), those would be the fastest.
Edit Apparently, std::max
is a template function call like your form C.
http://www.cplusplus.com/reference/algorithm/max/
Upvotes: 2