augustin-barillec
augustin-barillec

Reputation: 531

How can gcc -O3 make the run so fast?

Let test_speed.c be the following C code :

#include <stdio.h>
int main(){
    int i;
    for(i=0; i < 1000000000; i++) {}
    printf("%d", i);
}

I run in the terminal :

gcc -o test_speed test_speed.c 

and then :

time ./test_speed

I get :

enter image description here

Now i run the following :

gcc -O3 -o test_speed test_speed.c

and then :

time ./test_speed

I get :

enter image description here

How can the second run be this fast ? Is it already computed during the compilation ?

Upvotes: 2

Views: 666

Answers (4)

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140307

that's because -O3 aggressive optimization assumes that

for(i=0; i < 1000000000; i++) {}

has no side effect (except for the value of i) and removes the loop completely (directly setting i to 1000000000).

Disassembly (x86):

00000000 <_main>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   c7 44 24 04 00 ca 9a    movl   $0x3b9aca00,0x4(%esp)  <== 1000000000 in hex, no loop
  15:   3b
  16:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  1d:   e8 00 00 00 00          call   22 <_main+0x22>
  22:   31 c0                   xor    %eax,%eax
  24:   c9                      leave
  25:   c3                      ret

that optimization level is not suitable for calibrated active-CPU loops as you can see (the result is the same with -O2, but the loop remains unoptimized with just -O)

Upvotes: 7

dbush
dbush

Reputation: 225637

The compiler recognizes that the loop does nothing, and that removing it would not change the output of the program, so the loop was optimized away entirely.

Here's the assembly with -O0:

.L3:
    .loc 1 4 0 is_stmt 0 discriminator 3
    addl    $1, -4(%rbp)
.L2:
    .loc 1 4 0 discriminator 1
    cmpl    $999999999, -4(%rbp)        # loop 
    jle .L3 
    .loc 1 5 0 is_stmt 1
    movl    -4(%rbp), %eax
    movl    %eax, %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    movl    $0, %eax
    .loc 1 6 0
    leave   
    .cfi_def_cfa 7, 8
    ret

And with -O3:

main:
.LFB23:
    .file 1 "x1.c"
    .loc 1 2 0
    .cfi_startproc
.LVL0:
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
.LBB4:
.LBB5:
    .file 2 "/usr/include/x86_64-linux-gnu/bits/stdio2.h"
    .loc 2 104 0
    movl    $1000000000, %edx      # stored value, no loop
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
.LVL1:
.LBE5:
.LBE4:
    .loc 1 6 0
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret

You can see that in the -O3 case the loop is removed entirely and the final value of i, 1000000000, is stored directly.

Upvotes: 2

user2371524
user2371524

Reputation:

A compiler only has to keep the observable behavior of a program. Counting a variable without any I/O, interaction, or just using its value isn't observable, so as your loop doesn't do anything, the optimizer just throws it away completely and directly assigns the final value.

Upvotes: 2

Dirk is no longer here
Dirk is no longer here

Reputation: 368599

gcc "knows" that there is no body in the loop, and no dependency on any result, temporary or real -- so it removes the loop.

A good tool for analysis like this is godbolt.org which shows you the generated assembly. The difference between no optimization at all and the -O3 optmization is stark:

No optimization

enter image description here

With -O3

enter image description here

Upvotes: 3

Related Questions