Ruslan
Ruslan

Reputation: 19150

Why is this no-op loop not optimized away?

The following code does some copying from one array of zeroes interpreted as floats to another one, and prints timing of this operation. As I've seen many cases where no-op loops are just optimized away by compilers, including gcc, I was waiting that at some point of changing my copy-arrays program it will stop doing the copying.

#include <iostream>
#include <cstring>
#include <sys/time.h>

static inline long double currentTime()
{
    timespec ts;
    clock_gettime(CLOCK_MONOTONIC,&ts);
    return ts.tv_sec+(long double)(ts.tv_nsec)*1e-9;
}

int main()
{
    size_t W=20000,H=10000;

    float* data1=new float[W*H];
    float* data2=new float[W*H];
    memset(data1,0,W*H*sizeof(float));
    memset(data2,0,W*H*sizeof(float));

    long double time1=currentTime();
    for(int q=0;q<16;++q) // take more time
        for(int k=0;k<W*H;++k)
            data2[k]=data1[k];
    long double time2=currentTime();

    std::cout << (time2-time1)*1e+3 << " ms\n";

    delete[] data1;
    delete[] data2;
}

I compiled this with g++ 4.8.1 command g++ main.cpp -o test -std=c++0x -O3 -lrt. This program prints 6952.17 ms for me. (I had to set ulimit -s 2000000 for it to not crash.)

I also tried changing creation of arrays with new to automatic VLAs, removing memsets, but this doesn't change g++ behavior (apart from changing timings by several times).

It seems the compiler could prove that this code won't do anything sensible, so why didn't it optimize the loop away?

Upvotes: 7

Views: 1449

Answers (4)

manlio
manlio

Reputation: 18972

Anyway it isn't impossible (clang++ version 3.3):

clang++ main.cpp -o test -std=c++0x -O3 -lrt

The program prints 0.000367 ms for me... and looking at the assembly language:

...
callq   clock_gettime
movq    56(%rsp), %r14
movq    64(%rsp), %rbx
leaq    56(%rsp), %rsi
movl    $1, %edi
callq   clock_gettime
...

while for g++:

...
call    clock_gettime
fildq   32(%rsp)
movl    $16, %eax
fildq   40(%rsp)
fmull   .LC0(%rip)
faddp   %st, %st(1)
.p2align 4,,10
.p2align 3
.L2:
 movl    $1, %ecx
 xorl    %edx, %edx
 jmp     .L5
 .p2align 4,,10
 .p2align 3
 .L3:
 movq    %rcx, %rdx
 movq    %rsi, %rcx
 .L5:
 leaq    1(%rcx), %rsi
 movss   0(%rbp,%rdx,4), %xmm0
 movss   %xmm0, (%rbx,%rdx,4)
 cmpq    $200000001, %rsi
 jne     .L3
 subl    $1, %eax
 jne     .L2
 fstpt   16(%rsp)
 leaq    32(%rsp), %rsi
 movl    $1, %edi
 call    clock_gettime
 ...

EDIT (g++ v4.8.2 / clang++ v3.3)

SOURCE CODE - ORIGINAL VERSION (1)

...
size_t W=20000,H=10000;

float* data1=new float[W*H];
float* data2=new float[W*H];
...

SOURCE CODE - MODIFIED VERSION (2)

...
const size_t W=20000;
const size_t H=10000;

float data1[W*H];
float data2[W*H];
...

Now the case that isn't optimized is (1) + g++

Upvotes: 8

MSalters
MSalters

Reputation: 180305

The code in this question has changed quite a bit, invalidating correct answers. This answer applies to the 5th version: as the code currently attempts to read uninitialized memory, an optimizer may reasonably assume that unexpected things are happening.

Many optimization steps have a similar pattern: there's a pattern of instructions that's matched to the current state of compilation. If the pattern matches at some point, the matched pattern is (parametrically) replaced by a more efficient version. A very simple example of such a pattern is the definition of a variable that's not subsequently used; the replacement in this case is simply a deletion.

These patterns are designed for correct code. On incorrect code, the patterns may simply fail to match, or they may match in entirely unintended ways. The first case leads to no optimization, the second case may lead to totally unpredictable results (certainly if the modified code if further optimized)

Upvotes: 3

Stack Overflow is garbage
Stack Overflow is garbage

Reputation: 248279

The only way in which the compiler could know that this is a no-op is if it knew what memset does. In order for that to happen, the function must either be defined in a header (and it typically isn't), or it must be treated as a special intrinsic by the compiler. But barring those tricks, the compiler just sees a call to an unknown function which could have side effects and do different things for each of the two calls.

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 546153

Why do you expect the compiler to optimise this? It’s generally really hard to prove that writes to arbitrary memory addresses are a “no-op”. In your case it would be possible, but it would require the compiler to trace the heap memory addresses through new (which is once again hard since these addresses are generated at runtime) and there really is no incentive for doing this.

After all, you tell the compiler explicitly that you want to allocate memory and write to it. How is the poor compiler to know that you’ve been lying to it?

In particular, the problem is that the heap memory could be aliased to lots of other stuff. It happens to be private to your process but like I said above, proving this is a lot of work for the compiler, unlike for function local memory.

Upvotes: 0

Related Questions