Peter234
Peter234

Reputation: 1052

How can a const expr be evaluated so fast

I have been trying out const expressions which are evaluated at compile time. But I played with an example that seems incredibly fast when executed at compile time.

#include<iostream> 

constexpr long int fib(int n) { 
    return (n <= 1)? n : fib(n-1) + fib(n-2); 
} 

int main () {  
    long int res = fib(45); 
    std::cout << res; 
    return 0; 
} 

When I run this code it takes about 7 seconds to run. So far so good. But when I change long int res = fib(45) to const long int res = fib(45) it takes not even a second. To my understanding it is evaluated at compile time. But the compilation takes about 0.3 seconds

How can the compiler evaluate this so quickly, but at runtime it takes so much more time? I'm using gcc 5.4.0.

Upvotes: 13

Views: 461

Answers (2)

Suma
Suma

Reputation: 34393

You may find interesting with 5.4 the function is not completely eliminated, you need at least 6.1 for that.

I do not think there is any caching happening. I am convinced the optimizer is smart enough to prove relationship between fib(n - 2) and fib(n-1) and avoids the second call completely. This is GCC 5.4 output (obtained from godbolt) with no constexpr and -O2:

fib(long):
        cmp     rdi, 1
        push    r12
        mov     r12, rdi
        push    rbp
        push    rbx
        jle     .L4
        mov     rbx, rdi
        xor     ebp, ebp
.L3:
        lea     rdi, [rbx-1]
        sub     rbx, 2
        call    fib(long)
        add     rbp, rax
        cmp     rbx, 1
        jg      .L3
        and     r12d, 1
.L2:
        lea     rax, [r12+rbp]
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L4:
        xor     ebp, ebp
        jmp     .L2

I have to admit I do not understand the output with -O3 - the code generated is surprisingly complex, with lots of memory accesses and pointer arithmetics and it is quite possible there is some caching (memoization) done with those settings.

Upvotes: 1

molbdnilo
molbdnilo

Reputation: 66371

The compiler caches smaller values and does not need to recompute as much as the runtime version does.
(The optimiser is very good and generates lots of code including trickery with special cases that are incomprehensible to me; the naive 2^45 recursions would take hours.)

If you also store previous values:

int cache[100] = {1, 1};

long int fib(int n) {
    int res = cache[n];
    return res ? res : (cache[n] = fib(n-1) + fib(n-2));
} 

the runtime version is much faster than the compiler.

Upvotes: 5

Related Questions