c++cmemory-managementcompiler-optimization

Reputation: 2075

compiler memory optimization - reusing existing blocks

Say i were to allocate 2 memory blocks. I use the first memory block to store something and use this stored data. Then i use the second memory block to do something similar.

{
int a[10];
int b[10];

setup_0(a);
use_0(a);

setup_1(b);
use_1(b);    
}

 || compiler optimizes this to this?
 \/

{
int a[10];

setup_0(a);
use_0(a);

setup_1(a);
use_1(a);  
}

// the setup functions overwrites all 10 words

The question is now: Do compiler optimize this, so that they reuse the existing memory blocks, instead of allocating a second one, if the compiler knows that the first block will not be referenced again?

If this is true: Does this also work with dynamic memory allocation? Is this also possible if the memory persists outside the scope, but is used in the same way as given in the example? I assume this only works if setup and foo are implemented in the same c file (exist in the same object as the calling code)?

Upvotes: 6

Answers (5)

HBex

Reputation: 85

The short answer is: No! The compiler cannot optimize this code to what you suggested, because it is not semantically equivalent. Long explenation: The lifetime of a and b is with some simplification the complete block. So now lets assume, that one of setup_0 or use_0 stores a pointer to a in some global variable. Now setup_1 and use_1 are allowed to use a via this global variable in combination with b (It can for example add the array elements of a and b. If the transformation you suggested of the code was done, this would result in undefined behaviour. If you really want to make a statement about the lifetime, you have to write the code in the following way:

{
    { // Lifetime block for a
        char a[100];

        setup_0(a);
        use_0(a);
    } // Lifetime of a ends here, so no one of the following called
      // function is allowed to access it. If it does access it by
      // accident it is undefined behaviour

    char b[100];

    setup_1(b); // Not allowed to access a
    use_1(b);   // Not allowed to access a
}

Please also note that gcc 12.x and clang 15 both do the optimization. If you comment out the curly brackets, the optimization is (correctly!) not done.

Upvotes: 1

Robin Goupil

Reputation: 174

https://godbolt.org/g/5nDqoC

#include <cstdlib>

extern int a;
extern int b;

int main()
{
  {
    int tab[1];
    tab[0] = 42;
    a = tab[0];
  }

  {
    int tab[1];
    tab[0] = 42;
    b = tab[0];
  }

  return 0;
}

Compiled with gcc 7 with -O3 compilation flag:

main:
        mov     DWORD PTR a[rip], 42
        mov     DWORD PTR b[rip], 42
        xor     eax, eax
        ret

If you follow the link you should see the code being compiled on gcc and clang with -O3 optimisation level. The resulting asm code is pretty straight forward. As the value stored in the array is know at compilation time, the compiler can easily skip everything and straight up set the variables a and b. Your buffer is not needed.
Following a code similar to the one provided in your example:

https://godbolt.org/g/bZHSE4

#include <cstdlib>

int func1(const int (&tab)[10]);
int func2(const int (&tab)[10]);

int main()
{
  int a[10];
  int b[10];

  func1(a);
  func2(b);

  return 0;
}

Compiled with gcc 7 with -O3 compilation flag:

main:
        sub     rsp, 104
        mov     rdi, rsp ; first address is rsp
        call    func1(int const (&) [10])
        lea     rdi, [rsp+48] ; second address is [rsp+48]
        call    func2(int const (&) [10])
        xor     eax, eax
        add     rsp, 104
        ret

You can see the pointer sent to the function func1 and func2 is different as the first pointer used is rsp in the call to func1, and [rsp+48] in the call to func2.

You can see that either the compiler completely ignores your code in the case it is predictable. In the other case, at least for gcc 7 and clang 3.9.1, it is not optimized.

https://godbolt.org/g/TnV62V

#include <cstdlib>

extern int * a;
extern int * b;

inline int do_stuff(int ** to)
{
  *to = (int *) malloc(sizeof(int));
  (**to) = 42;
  return **to;
}

int main()
{
  do_stuff(&a);
  free(a);

  do_stuff(&b);
  free(b);

  return 0;
}

Compiled with gcc 7 with -O3 compilation flag:

main:
        sub     rsp, 8
        mov     edi, 4
        call    malloc
        mov     rdi, rax
        mov     QWORD PTR a[rip], rax
        call    free
        mov     edi, 4
        call    malloc
        mov     rdi, rax
        mov     QWORD PTR b[rip], rax
        call    free
        xor     eax, eax
        add     rsp, 8
        ret

While not being fluent at reading this, it is pretty easy to tell that with the following example, malloc and free is not being optimized neither by gcc or clang (if you want to try with more compiler, suit yourself but don't forget to set the optimization flag). You can clearly see a call to "malloc" followed by a call to "free", twice

Optimizing stack space is quite unlikely to really have an effect on the speed of your program, unless you manipulate large amount of data. Optimizing dynamically allocated memory is more relevant. AFAIK you will have to use a third-party library or run your own system if you plan to do that and this is not a trivial task.

EDIT: Forgot to mention the obvious, this is very compiler dependent.

Upvotes: 2

Cody Gray

Reputation: 244782

Yes, theoretically, a compiler could optimize the code as you describe, assuming that it could prove that these functions did not modify the arrays passed in as parameters.

But in practice, no, that does not happen. You can write a simple test case to verify this. I've avoided defining the helper functions so the compiler can't inline them, but passed the arrays by const-reference to ensure that the compiler knows the functions don't modify them:

void setup_0(const int (&p)[10]);
void use_0  (const int (&p)[10]);
void setup_1(const int (&p)[10]);
void use_1  (const int (&p)[10]);

void TestFxn()
{
   int a[10];
   int b[10];

   setup_0(a);
   use_0(a);

   setup_1(b);
   use_1(b);
}

As you can see here on Godbolt's Compiler Explorer, no compilers (GCC, Clang, ICC, nor MSVC) will optimize this to use a single stack-allocated array of 10 elements. Of course, each compiler varies in how much space it allocates on the stack. Some of that is due to different calling conventions, which may or may not require a red zone. Otherwise, it's due to the optimizer's alignment preferences.

Taking GCC's output as an example, you can immediately tell that it is not reusing the array a. The following is the disassembly, with my annotations:

; Allocate 104 bytes on the stack
; by subtracting from the stack pointer, RSP.
; (The stack always grows downward on x86.)
sub     rsp, 104


; Place the address of the top of the stack in RDI,
; which is how the array is passed to setup_0().
mov     rdi, rsp
call    setup_0(int const (&) [10])

; Since setup_0() may have clobbered the value in RDI,
; "refresh" it with the address at the top of the stack,
; and call use_0().
mov     rdi, rsp
call    use_0(int const (&) [10])


; We are now finished with array 'a', so add 48 bytes
; to the top of the stack (RSP), and place the result
; in the RDI register.
lea     rdi, [rsp+48]

; Now, RDI contains what is effectively the address of
; array 'b', so call setup_1().
; The parameter is passed in RDI, just like before.
call    setup_1(int const (&) [10])

; Second verse, same as the first: "refresh" the address
; of array 'b' in RDI, since it might have been clobbered,
; and pass it to use_1().
lea     rdi, [rsp+48]
call    use_1(int const (&) [10])


; Clean up the stack by adding 104 bytes to compensate for the
; same 104 bytes that we subtracted at the top of the function.
add     rsp, 104
ret

So, what gives? Are compilers just massively missing the boat here when it comes to an important optimization? No. Allocating space on the stack is extremely fast and cheap. There would be very little benefit in allocating ~50 bytes, as opposed to ~100 bytes. Might as well just play it safe and allocate enough space for both arrays separately.

There might be more of a benefit in reusing the stack space for the second array if both arrays were extremely large, but empirically, compilers don't do this, either.

Does this work with dynamic memory allocation? No. Emphatically no. I've never seen a compiler that optimizes around dynamic memory allocation like this, and I don't expect to see one. It just doesn't make sense. If you wanted to re-use the block of memory, you would have written the code to re-use it instead of allocating a separate block.

I suppose you are thinking that if you had something like the following C code:

void TestFxn()
{
   int* a = malloc(sizeof(int) * 10);
   setup_0(a);
   use_0(a);
   free(a);

   int* b = malloc(sizeof(int) * 10);
   setup_1(b);
   use_1(b);
   free(b);
}

that the optimizer could see that you were freeing a, and then immediately re-allocating a block of the same size as b? Well, the optimizer won't recognize this and elide the back-to-back calls to free and malloc, but the run-time library (and/or operating system) very likely will. free is a very cheap operation, and since a block of the appropriate size was just released, allocation will also be very cheap. (Most run-time libraries maintain a private heap for the application and won't even return the memory to the operating system, so depending on the memory-allocation strategy, it's even possible that you get the exact same block back.)

Upvotes: -2

eerorika

Reputation: 238351

Do compiler optimize this

This question can only be answered if you ask about a particular compiler. And the answer can be found by inspecting the generated code.

so that they reuse the existing memory blocks, instead of allocating a second one, if the compiler knows that the first block will not be referenced again?

Such optimization would not change the behaviour of the program, so it would be allowed. Another matter is: Is it possible to prove that the memory will not be referenced? If it is possible, then is it easy enough to prove in reasonable time? I feel very safe in saying that it is not possible to prove in general, but it is provable in some cases.

I assume this only works if setup and foo are implemented in the same c file (exist in the same object as the calling code)?

That would usually be required to prove the untouchability of the memory. Link time optimization might lift this requirement, in theory.

Does this also work with dynamic memory allocation?

In theory, since it doesn't change the behaviour of the program. However, the dynamic memory allocation is typically performed by a library and thus the compiler may not be able to prove the lack of side-effects and therefore wouldn't be able to prove that removing an allocation wouldn't change behaviour.

Is this also possible if the memory persists outside the scope, but is used in the same way as given in the example?

If the compiler is able to prove that the memory is leaked, then perhaps.

Even though the optimization may be possible, it is not very significant. Saving a bit of stack space probably has very little effect on run time. It could be useful to prevent stack overflows if the arrays are large.

Upvotes: 7

Paul Ogilvie

Reputation: 25286

As the compiler sees that a is used as a parameter for a function, it will not optimize b away. It can't, because it doesn't know what happens in the function that uses a and b. Same for a: the compiler doesn't know that a isn't used anymore.

As far as the compiler is concerned, the address of a could e.g. have ben stored by setup0 in a global variable and will be used by setup1 when it is called with b.

Upvotes: 1

compiler memory optimization - reusing existing blocks

Answers (5)

Related Questions