Reputation: 71
I've got a very simple c program which copies all elements from array A to back to array A. For example,
double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
A[i] = A[i];
}
I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?
Many thanks.
Upvotes: 7
Views: 419
Reputation: 47759
To optimize away the loop the compiler had to recognize several things:
Keep in mind that optimizations are oriented towards removing "normal" redundancies, not eliminating "classroom examples".
Upvotes: 2
Reputation: 108986
my gcc (version 4.6.1) optimizes it out
$ cat 7680489.c
#include <stdlib.h> #define SIZE 100 int main(void) { double *a; size_t i; a = calloc(SIZE, sizeof *a); /* initialize elements */ for (i = 0; i < SIZE; i++) a[i] = a[i]; free(a); return 0; }
$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s
.file "7680489.c" .section .text.startup,"ax",@progbits .p2align 4,,15 .globl main .type main, @function main: .LFB3: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $8, %esi movl $100, %edi call calloc movq %rax, %rdi call free xorl %eax, %eax addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE3: .size main, .-main .ident "GCC: (Debian 4.6.1-4) 4.6.1" .section .note.GNU-stack,"",@progbits
No loop that I can see. The assembly output is very similar when using malloc
rather than calloc
. I switched to calloc
to avoid having objects with indeterminate values about (thanks R..).
Upvotes: 7
Reputation: 215617
This code has undefined behavior (using objects with indeterminate value) so why would you have any expectation for what it does?
Upvotes: 0
Reputation: 7213
As a general big of info, two of the biggest things that compiler have trouble optimizing are loops and pointers, and your example deals with both. Compilers know that values change frequently in loops, and so they are very conservative when optimizing those. Also, A is a pointer, and compilers are aware that pointers can change due to diverse factors, and so they back off when changing those. That's why the compiler has trouble with your example.
Upvotes: 0
Reputation: 54094
The problem here is that you are using pointers.
It is hard for a compiler to optimise pointers since it can assume that pointer can read/write to any location in memory.
Change to []
array operator instead and try again. You should see the optimization you expect
Upvotes: 0
Reputation: 76753
The compiler does not do any actual thinking.
It can only optimize out stuff that matches a preconceived pattern.
I.e. if the code does not match a known no-op pattern that has been pre-programmed into the compiler it does not get eliminated.
By putting in the A[i] = A[i]
you changed the pattern just enough to not match the empty loop pattern
Upvotes: 1
Reputation: 54634
From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.
For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.
Upvotes: 7