Reputation: 6180
Contrary to what this question says, this piece of code is exhibiting some weird behaviour:
long long int fibonacci(int num) {
if (num <= 2) return 1;
return fibonacci(num - 1) + fibonacci(num - 2);
}
int main() {
auto t1 = std::chrono::high_resolution_clock::now();
long long int x = fibonacci(45);
auto t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> time(t2 - t1);
std::cout << "Time taken: " << time.count() << "ms";
}
On my machine, this compiles in ~700ms with -O3
(GCC) and the output is:
Time taken: 2667.55ms
I rewrote the above code with constexpr
as follows:
constexpr long long int fibonacci(int num) {
if (num <= 2) return 1;
return fibonacci(num - 1) + fibonacci(num - 2);
}
int main() {
auto t1 = std::chrono::high_resolution_clock::now();
constexpr long long int x = fibonacci(45);
auto t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> time(t2 - t1);
std::cout << "Time taken: " << time.count() << "ms";
}
Which compiles in roughly the same time but the output is:
Time taken: 0ms
As it stands, evaluating fibonacci(45)
at compile-time is much faster than evaluating it at runtime. To eliminate multi-core compiling as a reason (which definitely isn't), I re-ran the second block above with fibonacci(60)
. Again, the code compiles in the same amount of time but the output is:
Time taken: 29499.6ms
What causes this big performance gap?
Upvotes: 8
Views: 1719
Reputation: 4835
I'm not convinced by the other answers' explanations.
Here is GCC Team member Jakub Jelink talking about their constexpr cache in a bug ticket. He describes that they're caching constexpr results depending on arguments (among other things).
Relevant code in source mirror. This is a little more involved. According to how the hash table is handled in the code, GCC globally caches the results (per compilation units) of constexpr calls according to constant argument values, template parameters, and object instance (if it's actually a method that can be evaluated on a constexpr object). Although the code dealing with this is quite verbose.
In addition, here's a test supporting the memoization claim by HolyBlackCat:
#include <stdio.h>
static constexpr unsigned long long test(int arg)
{
unsigned long long dummy = 0;
for (unsigned long long i = 0; i < 1000000ULL + arg; ++i)
{
dummy += 1;
}
return dummy;
}
int main(int argc, char **argv)
{
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(5); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(6 /*!*/); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(7 /*!*/); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(8 /*!*/); printf("Result: %llu\n", result); }
{ constexpr unsigned long long result = test(9 /*!*/); printf("Result: %llu\n", result); }
return 0;
}
The idea is to run some constexpr
function that runs in some known amount of real time. On my machine, the function took like 2 seconds. The exact time is not important -- what's important is that the execution time is in fact proportional to the number of loop iterations.
The following observations hint at memoization:
result
lines in main is present, then the compile time is about some value x.result
lines with argument 5
in test(5)
are present (but none of the lines with other arguments), the compile time is the same value x.result
lines with arguments 6, 7, 8, 9
are present as well, then the compile time is about 5 times x.Also keep in mind that, while the compiler is perfectly optimize the loop to a simple addition in the generated code when compiling with -O2
, it completely fails to do any sort of optimization during constexpr evaluation, as is evident by the absurdly long compile times. If it were to do any nontrivial optimizations at that stage, it would surely be able to collapse the loop. Guesswork: Having contributed code to compilers before I would consider it plausible that the pipeline architecture is actually way too complicated for full optimization on constexpr code that gets evaluated at compile time, and that it may actually be evaluating an expression tree which is very close to the original AST in structure.
Tested with g++ -std=c++14
gcc version 5.4.0.20160609 running on linux mint.
Upvotes: 2
Reputation: 15277
Basically, for that short piece of code, compile time does not matter.
Even, if you do compile time evaluation.
The main problem is here the utmost bad algorithm used here. Using 2 recursive calls which will then call again 2 recursive functions and so on and so on has really the worst case time complexity for such a simple algorithm.
That problem can and must be solved with an iterative approach.
Something like that:
unsigned long long fibonacci(size_t index) noexcept {
unsigned long long f1{ 0ull }, f2{ 1ull }, f3{};
while (index--) { f3 = f2 + f1; f1 = f2; f2 = f3; }
return f2;
}
If you use this function, then, reagrdless of optimization or not, your main will run in below 1ms.
In you original main function, if you do not use the result of the calculation, so the variable "X", then optimizing compiler can eliminate the complete function call. It is not necessary to call that function. The result is not used.
Make the following experiment.
Add std::cout << x << '\n';
as the last statement in your main function. You will be surprised. With enabled optimization, the function will be called at the very end. And you time measurement measures nothing.
Now to the compiler time version. Also the compiler will internally used optimized code. And it will convert the nonesense recursive approach to an iterative approach. And to calculate that values will take less time than the compiler overhead functions.
So, that is the real reason.
And the Fibonacci numbers can always be compiled fully at compile time. There are only 93 possible results for an 64 bit result value.
See the following approach, which creates a compile time array of all Fibonacci numbers. And if you want the n'th number, it is just a lookup value.
This will work very fast in either non-optimized or optimized mode.
It is the fasted possible solution for that problem.
#include <iostream>
#include <utility>
#include <array>
#include <algorithm>
#include <iterator>
// All done during compile time -------------------------------------------------------------------
constexpr unsigned long long getFibonacciNumber(size_t index) noexcept {
unsigned long long f1{ 0ull }, f2{ 1ull }, f3{};
while (index--) { f3 = f2 + f1; f1 = f2; f2 = f3; }
return f2;
}
// Some helper to create a constexpr std::array initilized by a generator function
template <typename Generator, size_t ... Indices>
constexpr auto generateArrayHelper(Generator generator, std::index_sequence<Indices...>) {
return std::array<decltype(std::declval<Generator>()(size_t{})), sizeof...(Indices) > { generator(Indices)... };
}
template <size_t Size, typename Generator>
constexpr auto generateArray(Generator generator) {
return generateArrayHelper(generator, std::make_index_sequence<Size>());
}
constexpr size_t MaxIndexFor64BitFibonacci = 93;
// This is the definition of a std::array<unsigned long long, 93> with all Fibonacci numbers in it
constexpr auto Fib = generateArray<MaxIndexFor64BitFibonacci>(getFibonacciNumber);
// End of: All done during compile time -----------------------------------------------------------
// The only code executed during runtime
int main() {
// Show all Fibonacci numbers up to border value
for (const auto fib : Fib) std::cout << fib << '\n';
}
Upvotes: 6
Reputation: 14589
I'm not sure why this question is even appears unless you didn't exactly read up how constexpr is described. You can't work by "guessing" what's going on in program written in C++ unless the behaviour is part of UB or platform-defined.
C++ standard doesn't describe memoization in any way, thus we have no reason to suspect that such complex caching procedure may be used. Languages which support memoization (or things called "enterring", "tombstoning", and similar) are usually the same that are running intermediate code on some virtual machine and de-facto do not provide mechanics for compile-time evaluation of an arbitrary function as such never can be done for unknown platform.
constexpr
allows creation of "constants" which are initialized through code executed on compiler's side (with all possible platform-defined effects and all restrictions applied in related standard).
Without it as-if rule applies, which includes a freedom for compiler not to do anything about changing code. Compiler is also allowed to throw whole thing way if result is not observable and it is clear that there is no side-effects of function call, which is easier to check when constexpr
is used, i.e. there are no side-effects by definition.
In constexpr
version whole execution stack would be unwinded because you explicitly told compiler that it is possible. The limitations on constexpr
functions imposed by standard guarantee that function would be fit for such evaluation.
There can be unexpected results when cross-compiling for platform which got behaviour different from host one: you may get constexpr
values for compiler-side platform.
With recursive constexpr
functions you risk to exceed limit of recursions which is implementation-defined. E.g. your case wouldn't compile with some builds of g++ at all as you have more than 32 millions of iterations there due non-optimal recursion. THe way recursion is done, there is no memoization is possible on compile time.
When it is possible? If f(x2)
is calling f(x1)
, after f(x1)
was called, evaluation of f(x2)
might use already known value of f(x1)
. In your case fibonacci(x)
calls fibonacci(x-1)
and fibonacci(x-2)
for any given x
greater than 2. Those two values weren't evaluated before. It could look so that fibonacci(x-1)
may reuse or provide fibonacci(x-2)
's value, but depending on C++ standard's version it's either:
A.) order of execution is undefined;
B.) fibonacci(x-1)
is a
context different from context of fibonacci(x)
.
In second case no code is executed between calls to chrono
aside of initialization of x
with value calculated by compiler. If you used static constexpr long long int x
for the local variable, then even initialization would be gone, it would be a hard-coded constant value somewhere in data or in code where it's used. static
formally prolongs life of object till the end of process, while an automatic constexpr variable would be initialized each time the function was entered (irrelevant to fact that main() should be entered only once).
Upvotes: 3