Marc
Marc

Reputation: 4477

GCC: Optimizing away memory loads and stores

EDIT 1: Added another example (showing that GCC is, in principle, be capable to do what I want to achieve) and some more discussion at the end of this question.

EDIT 2: Found the malloc function attribute, which should do what. Please take a look at the very end of the question.

This is a question about how to tell the compiler that stores to a memory area are not visible outside of a region (and thus could be optimized away). To illustrate what I mean, let's take a look at the following code

int f (int a)
{
    int v[2];
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
       v[1] += v[0];
    return v[1];
}

gcc -O2 generates the following assembly code (x86-64 gcc, trunk, on https://godbolt.org):

f:
        leal    -1(%rdi), %edx
        xorl    %eax, %eax
        testl   %edi, %edi
        jle     .L4
.L3:
        addl    %edx, %eax
        subl    $1, %edx
        cmpl    $-1, %edx
        jne     .L3
        ret
.L4:
        ret

As one can see, the loads and stores into the array v are gone after optimization.

Now consider the following code:

int g (int a, int *v)
{
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
       v[1] += v[0];
    return v[1];
}

The difference is that v is not (stack-) allocated in the function, but provided as an argument. The result of gcc -O2 in this case is:

g:
        leal    -1(%rdi), %edx
        movl    $0, 4(%rsi)
        xorl    %eax, %eax
        movl    %edx, (%rsi)
        testl   %edi, %edi
        jle     .L4
.L3:
        addl    %edx, %eax
        subl    $1, %edx
        cmpl    $-1, %edx
        jne     .L3
        movl    %eax, 4(%rsi)
        movl    $-1, (%rsi)
        ret
.L4:
        ret

Clearly, the code has to store the final values of v[0] and v[1] in memory as they may be observable.

Now, what I am looking for is a way to tell the compiler that the memory pointed to by v in the second example isn't accessible any more after the function g has returned so that the compiler could optimize away the memory accesses.

To have an even simpler example:

void h (int *v)
{
    v[0] = 0;
}

If the memory pointed to by v isn't accessible after h returns, it should be possible to simplify the function to a single ret.

I tried to achieve what I want by playing with the strict aliasing rules but haven't succeeded.

ADDED IN EDIT 1:

GCC seems to have the necessary code built-in as the following example shows:

include <stdlib.h>

int h (int a)
{
    int *v = malloc (2 * sizeof (int));
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
      v[1] += v[0];
    return v[1];
}

The generated code contains no loads and stores:

h:
        leal    -1(%rdi), %edx
        xorl    %eax, %eax
        testl   %edi, %edi
        jle     .L4
.L3:
        addl    %edx, %eax
        subl    $1, %edx
        cmpl    $-1, %edx
        jne     .L3
        ret
.L4:
        ret

In other words, GCC knows that changing the memory area pointed to by v is not observable through any side-effect of malloc. For purposes like this one, GCC has __builtin_malloc.

So I can also ask: How can user code (say a user version of malloc) make use of this functionality?

ADDED IN EDIT 2:

GCC has the following function attribute:

malloc

This tells the compiler that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P.

Using this attribute can improve optimization. Compiler predicts that a function with the attribute returns non-null in most cases. Functions like malloc and calloc have this property because they return a pointer to uninitialized or zeroed-out storage. However, functions like realloc do not have this property, as they can return a pointer to storage containing pointers.

It seems to do what I want as the following example shows:

__attribute__ (( malloc )) int *m (int *h);

int i (int a, int *h) 
{ 
    int *v = m (h);
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
        v[1] += v[0];
    return v[1];
}

The generated assembler code has no loads and stores:

i:
        pushq   %rbx
        movl    %edi, %ebx
        movq    %rsi, %rdi
        call    m
        testl   %ebx, %ebx
        jle     .L4
        leal    -1(%rbx), %edx
        xorl    %eax, %eax
.L3:
        addl    %edx, %eax
        subl    $1, %edx
        cmpl    $-1, %edx
        jne     .L3
        popq    %rbx
        ret
.L4:
        xorl    %eax, %eax
        popq    %rbx
        ret

However, as soon as the compiler sees a definition of m, it may forget about the attribute. For example, this is the case when the following definition is given:

__attribute__ (( malloc )) int *m (int *h)
{
    return h;
}

In that case, the function is inlined and the compiler forgets about the attribute, yielding the same code as the function g.

P.S.: Initially, I thought that the restrict keyword may help, but it doesn't seem so.

Upvotes: 4

Views: 502

Answers (3)

Marc
Marc

Reputation: 4477

EDIT: Discussion about the noinline attribute added at the end.

Using the following function definition, one can achieve the goal of my question:

__attribute__ (( malloc, noinline )) static void *get_restricted_ptr (void *p)
{
    return p;
}

This function get_restricted_ptr simply returns its pointer argument but informs the compiler that the returned pointer P cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P.

The use of this function is demonstrated here:

int i (int a, int *h)
{
    int *v = get_restricted_ptr (h);
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
        v[1] += v[0];
    return;
}

The generated code does not contain loads and stores:

i:
        leal    -1(%rdi), %edx
        xorl    %eax, %eax
        testl   %edi, %edi
        jle     .L6
.L5:
        addl    %edx, %eax
        subl    $1, %edx
        cmpl    $-1, %edx
        jne     .L5
        ret
.L6:
        ret

ADDED IN EDIT: If the noinline attribute is left out, GCC ignores the malloc attribute. Apparently, in this case, the function gets inlined first so that there is no function call any more for which GCC would check the malloc attribute. (One can discuss whether this behaviour should be considered a bug in GCC.) With the noinline attribute, the function doesn't get inlined. Then, due to the malloc attribute, GCC understands that the call to that function is unnecessary and removes it completely.

Unfortunately, this means that the (trivial) function won't be inlined when its call is not eliminated due to the malloc attribute.

Upvotes: 1

Brendan
Brendan

Reputation: 37222

For C, the only restriction is that the compiler has to ensure that the code behaves the same. If the compiler can prove that the code behaves the same then it can and will remove the stores.

For example, I put this into https://godbolt.org/ :

void h (int *v)
{
    v[0] = 0;
}

void foo() {
    int v[2] = {1, 2};
    h(v);
}

And told it to use GCC 8.2 and "-O3", and got this output:

h(int*):
        mov     DWORD PTR [rdi], 0
        ret
foo():
        ret

Note that there are two different versions of the function h() in the output. The first version exists in case other code (in other object files) want to use the function (and may be discarded by the linker). The second version of h() was inlined directly into foo() and then optimised down to absolutely nothing.

If you change the code to this:

static void h (int *v)
{
    v[0] = 0;
}

void foo() {
    int v[2] = {1, 2};
    h(v);
}

Then it tells the compiler that the version of h() that only existed for linking with other object files isn't needed, so the compiler only generates the second version of h() and the output becomes this:

foo():
        ret

Of course all optimizers in all compiler's aren't perfect - for more complex code (and for different compilers including different versions of GCC) results might be different (the compiler may fail to do this optimization). This is purely a limitation of the compiler's optimizer and not a limitation of C itself.

For cases where the compiler's optimiser isn't good enough, there are 4 possible solutions:

  • get a better compiler

  • improve the compiler's optimiser (e.g. send an email with to the compiler's developers that includes a minimal example and cross your fingers)

  • modify the code to make it easier for the compiler's optimiser (e.g. copy the input array into a local array, like "void h(int *v) { int temp[2]; temp[0] = v[0]; temp[1] = v[1]; ... ).

  • shrug and say "Oh, that's a pity" and do nothing

Upvotes: 0

0___________
0___________

Reputation: 67835

Both functions have side effects and memory reads & stores cannot be optimized out

void h (int *v)
{
    v[0] = 0;
}

and

int g (int a, int *v)
{
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
       v[1] += v[0];
    return v[1];
}

The side effects have to be observable outside the function scope. Inline functions may have another behavior as the side effect might have to be observable outside the enclosing code.

inline int g (int a, int *v)
{
    v[0] = a;
    v[1] = 0;
    while (v[0]-- > 0)
       v[1] += v[0];
    return v[1];
}

void h(void)
{
    int x[2],y ;

    g(y,x);
}

this code will be optimized to just a simple return

You can promise the compiler that nothing will happen to allow easier optimizations by using keyword restrict. But of course your code must keep this promise.

Upvotes: 0

Related Questions