SmacL
SmacL

Reputation: 22922

Do C and C++ optimizers typically know which functions have no side effects?

Say for very common math functions, such as sin, cos, etc... does the compiler realise they have no side effects and have the ability to move them to outer loops? For example

// Unoptimized

double YSinX(double x,int y)
{
   double total = 0.0;
   for (int i = 0; i < y; i++)
      total += sin(x);
   return total;
}

// Manually optimized

double YSinX(double x,int y)
{
   double total = 0.0, sinx = sin(x);
   for (int i = 0; i < y; i++)
      total += sinx;
   return total;
}

If they can, is there a way of declaring a function as having no side effects, and hence being safe to optimize in this way? Initial profiling of a VS2010 app suggests that the optimization is beneficial.

See also this related question, which is close but doesn't quite answer my own.

Edit: Some great answers. The one I accepted was based as much on the comments it provoked as the answer itself, notably the linked article, and the fact that hoisting may not occur in situations where errno is set (i.e. a side effect). As such, and in the context of what I'm doing, this type of manual optimization still appears to make sense.

Upvotes: 43

Views: 3368

Answers (4)

adl
adl

Reputation: 16034

GCC has two attributes, pure and const, that may be used to mark such function. If the function has no side-effect and its result depends only on its arguments, the function should be declared const, if the results may also depend on some global variable the function should be declared pure. Recent versions also have a -Wsuggest-attribute warning option that can point functions which ought to be declared const or pure.

Upvotes: 34

autistic
autistic

Reputation: 15642

As a matter of fact, todays common compilers will perform the kind of loop-invariant code motion optimisation you're asking about. For a demonstration of this, see the second exercise within this article entitled "Will it Optimize?", or use gcc -S -O3 and/or clang -S -O3 to assemble the example below and inspect the main entry point in assembly, as I did out of curiosity. If your VS2010 compiler doesn't perform this optimisation, not to matter; llvm/clang "integrates with MSVC 2010, 2012, 2013 and 14 CTP".

From a theoretical standing, these two quotes explain the scope or headroom that a compiler has when performing optimisations. They're from the C11 standard. IIRC C++11 has something similar.

§5.1.2.3p4:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

§5.1.2.3p6:

The least requirements on a conforming implementation are:

— Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

— At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.

— The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

This is the observable behavior of the program.

Thus, a compiler might hoist your entire program into compile-time evaluation if it can do so. Consider the following program, for example:

#include <math.h>
#include <stdio.h>

double YSinX(double x,int y)
{
    double total = 0.0;
    for (int i = 0; i < y; i++)
        total += sin(x);
    return total;
}

int main(void) {
    printf("%f\n", YSinX(M_PI, 4));
}

Your compiler might realise that this program prints 0.0\n every single time, and optimise your program into:

int main(void) { puts("0.0"); }

That is, providing your compiler can prove that neither sin nor YsinX cause any needed side-effects. Note that they may (and probably do) still cause side-effects, but they're not needed to produce the output of this program.

To demonstrate the theoretical knowledge applied in practice, I tested both llvm/clang (3.8.0 from clang --version) and gcc (6.4.0 from gcc --version) by assembling (using gcc -S -O3/clang -S -O3) the code above on my Windows 10 system, both of these compilers have effectively applied the optimisation described above; in practice you can expect main from the example above to be transformed into a machine code equivalent of int main(void) { printf("%f", 0.0); }.

You've asked a question about "the compiler". If you're referring to all C or C++ implementations, there are no guaranteed optimisations and a C implementation need not even be a compiler. You'd need to tell us which particular C or C++ implementation; as I explained above, LLVM/Clang "integrates with MSVC 2010, 2012, 2013 and 14 CTP" so it's possible that you might be using that. If your C or C++ compiler doesn't produce optimal code, get a new compiler (e.g. LLVM/Clang) or produce the optimisation yourself, preferably by modifying your compiler so you can send a patch to the developers and have the optimisation automatically propagated to other projects.

Upvotes: 13

Ben Voigt
Ben Voigt

Reputation: 283624

What is needed to permit hoisting this subexpression outside the loop is not purity, but idempotence.

Idempotence means that a function will have the same side-effects and result if it is called once as if it is called many times with the same arguments. Therefore, the compiler can put the function call outside the loop, protected only by a conditional (would the loop iterate at least once?). The actual code after the hoisting optimization then would be:

double YSinX(double x,int y)
{
   double total = 0.0;
   int i = 0;
   if (i < y) {
       double sinx = sin(x);  // <- this goes between the loop-initialization
                              // first test of the condition expression
                              // and the loop body
       do {
          total += sinx;
          i++;
       } while (i < y);
   }
   return total;
}

The distinction between __attribute__(pure) and idempotent is important because, as adl notes in his comment, these functions do have a side-effect of setting errno.

Be careful, though, because idempotence only applies to repeated calls with no intervening instructions. The compiler will have to perform dataflow analysis to prove that the function and the intervening code don't interact (for example, the intervening code uses only locals whose addresses are never taken), before it can take advantage of idempotence. This isn't necessary when the function is known to be pure. But purity is a much stronger condition that doesn't apply to very many functions.

Upvotes: 7

aliep
aliep

Reputation: 1772

I think, yes. If you get compiler disassembly output you can see that, sin is called in another label than the loop label for 'for': (compiled with g++ -O1 -O2 -O3)

Leh_func_begin1:
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
        pushq   %rbx
        subq    $8, %rsp
Ltmp2:
        testl   %edi, %edi
        jg      LBB1_2
        pxor    %xmm1, %xmm1
        jmp     LBB1_4
LBB1_2:
        movl    %edi, %ebx
        callq   _sin ;sin calculated
        pxor    %xmm1, %xmm1
        .align  4, 0x90
LBB1_3:
        addsd   %xmm0, %xmm1
        decl    %ebx
        jne     LBB1_3 ;loops here till i reaches y
LBB1_4:
        movapd  %xmm1, %xmm0

I hope i'm correct.

Upvotes: 6

Related Questions