How does GCC optimize out an unused variable incremented inside a loop?

Question

I wrote this simple C program:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

gcc test.c and then time on a.out gives:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s

$ gcc -O2 test.c and then time ona.out` gives:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s

Then I disassembled both with gcc -S. First one seems quite clear:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

Can someone briefly explain:

The gcc -S -O2 output.
If the loop is optimized as I expected (one sum instead of many sums)?

Mysticial · Accepted Answer

The compiler is even smarter than that. :)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

This is called Dead Code Elimination.

A better test is to print the result:

#include 
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d
", count);
}

EDIT : I've added the required #include ; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.

I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d
",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d
"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

How does GCC optimize out an unused variable incremented inside a loop?

Answers (2)

Related Questions