Reputation: 4477
EDIT 1: Added another example (showing that GCC is, in principle, be capable to do what I want to achieve) and some more discussion at the end of this question.
EDIT 2: Found the malloc
function attribute, which should do what. Please take a look at the very end of the question.
This is a question about how to tell the compiler that stores to a memory area are not visible outside of a region (and thus could be optimized away). To illustrate what I mean, let's take a look at the following code
int f (int a)
{
int v[2];
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
gcc -O2
generates the following assembly code (x86-64 gcc, trunk, on https://godbolt.org):
f:
leal -1(%rdi), %edx
xorl %eax, %eax
testl %edi, %edi
jle .L4
.L3:
addl %edx, %eax
subl $1, %edx
cmpl $-1, %edx
jne .L3
ret
.L4:
ret
As one can see, the loads and stores into the array v
are gone after optimization.
Now consider the following code:
int g (int a, int *v)
{
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
The difference is that v
is not (stack-) allocated in the function, but provided as an argument. The result of gcc -O2
in this case is:
g:
leal -1(%rdi), %edx
movl $0, 4(%rsi)
xorl %eax, %eax
movl %edx, (%rsi)
testl %edi, %edi
jle .L4
.L3:
addl %edx, %eax
subl $1, %edx
cmpl $-1, %edx
jne .L3
movl %eax, 4(%rsi)
movl $-1, (%rsi)
ret
.L4:
ret
Clearly, the code has to store the final values of v[0]
and v[1]
in memory as they may be observable.
Now, what I am looking for is a way to tell the compiler that the memory pointed to by v
in the second example isn't accessible any more after the function g
has returned so that the compiler could optimize away the memory accesses.
To have an even simpler example:
void h (int *v)
{
v[0] = 0;
}
If the memory pointed to by v
isn't accessible after h
returns, it should be possible to simplify the function to a single ret
.
I tried to achieve what I want by playing with the strict aliasing rules but haven't succeeded.
ADDED IN EDIT 1:
GCC seems to have the necessary code built-in as the following example shows:
include <stdlib.h>
int h (int a)
{
int *v = malloc (2 * sizeof (int));
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
The generated code contains no loads and stores:
h:
leal -1(%rdi), %edx
xorl %eax, %eax
testl %edi, %edi
jle .L4
.L3:
addl %edx, %eax
subl $1, %edx
cmpl $-1, %edx
jne .L3
ret
.L4:
ret
In other words, GCC knows that changing the memory area pointed to by v
is not observable through any side-effect of malloc
. For purposes like this one, GCC has __builtin_malloc
.
So I can also ask: How can user code (say a user version of malloc
) make use of this functionality?
ADDED IN EDIT 2:
GCC has the following function attribute:
malloc
This tells the compiler that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P.
Using this attribute can improve optimization. Compiler predicts that a function with the attribute returns non-null in most cases. Functions like malloc and calloc have this property because they return a pointer to uninitialized or zeroed-out storage. However, functions like realloc do not have this property, as they can return a pointer to storage containing pointers.
It seems to do what I want as the following example shows:
__attribute__ (( malloc )) int *m (int *h);
int i (int a, int *h)
{
int *v = m (h);
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
The generated assembler code has no loads and stores:
i:
pushq %rbx
movl %edi, %ebx
movq %rsi, %rdi
call m
testl %ebx, %ebx
jle .L4
leal -1(%rbx), %edx
xorl %eax, %eax
.L3:
addl %edx, %eax
subl $1, %edx
cmpl $-1, %edx
jne .L3
popq %rbx
ret
.L4:
xorl %eax, %eax
popq %rbx
ret
However, as soon as the compiler sees a definition of m
, it may forget about the attribute. For example, this is the case when the following definition is given:
__attribute__ (( malloc )) int *m (int *h)
{
return h;
}
In that case, the function is inlined and the compiler forgets about the attribute, yielding the same code as the function g
.
P.S.: Initially, I thought that the restrict
keyword may help, but it doesn't seem so.
Upvotes: 4
Views: 502
Reputation: 4477
EDIT: Discussion about the noinline
attribute added at the end.
Using the following function definition, one can achieve the goal of my question:
__attribute__ (( malloc, noinline )) static void *get_restricted_ptr (void *p)
{
return p;
}
This function get_restricted_ptr
simply returns its pointer argument but informs the compiler that the returned pointer P cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P.
The use of this function is demonstrated here:
int i (int a, int *h)
{
int *v = get_restricted_ptr (h);
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return;
}
The generated code does not contain loads and stores:
i:
leal -1(%rdi), %edx
xorl %eax, %eax
testl %edi, %edi
jle .L6
.L5:
addl %edx, %eax
subl $1, %edx
cmpl $-1, %edx
jne .L5
ret
.L6:
ret
ADDED IN EDIT: If the noinline
attribute is left out, GCC ignores the malloc
attribute. Apparently, in this case, the function gets inlined first so that there is no function call any more for which GCC would check the malloc
attribute. (One can discuss whether this behaviour should be considered a bug in GCC.) With the noinline
attribute, the function doesn't get inlined. Then, due to the malloc
attribute, GCC understands that the call to that function is unnecessary and removes it completely.
Unfortunately, this means that the (trivial) function won't be inlined when its call is not eliminated due to the malloc
attribute.
Upvotes: 1
Reputation: 37222
For C, the only restriction is that the compiler has to ensure that the code behaves the same. If the compiler can prove that the code behaves the same then it can and will remove the stores.
For example, I put this into https://godbolt.org/ :
void h (int *v)
{
v[0] = 0;
}
void foo() {
int v[2] = {1, 2};
h(v);
}
And told it to use GCC 8.2 and "-O3", and got this output:
h(int*):
mov DWORD PTR [rdi], 0
ret
foo():
ret
Note that there are two different versions of the function h()
in the output. The first version exists in case other code (in other object files) want to use the function (and may be discarded by the linker). The second version of h()
was inlined directly into foo()
and then optimised down to absolutely nothing.
If you change the code to this:
static void h (int *v)
{
v[0] = 0;
}
void foo() {
int v[2] = {1, 2};
h(v);
}
Then it tells the compiler that the version of h()
that only existed for linking with other object files isn't needed, so the compiler only generates the second version of h()
and the output becomes this:
foo():
ret
Of course all optimizers in all compiler's aren't perfect - for more complex code (and for different compilers including different versions of GCC) results might be different (the compiler may fail to do this optimization). This is purely a limitation of the compiler's optimizer and not a limitation of C itself.
For cases where the compiler's optimiser isn't good enough, there are 4 possible solutions:
get a better compiler
improve the compiler's optimiser (e.g. send an email with to the compiler's developers that includes a minimal example and cross your fingers)
modify the code to make it easier for the compiler's optimiser (e.g. copy the input array into a local array, like "void h(int *v) { int temp[2]; temp[0] = v[0]; temp[1] = v[1]; ...
).
shrug and say "Oh, that's a pity" and do nothing
Upvotes: 0
Reputation: 67835
Both functions have side effects and memory reads & stores cannot be optimized out
void h (int *v)
{
v[0] = 0;
}
and
int g (int a, int *v)
{
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
The side effects have to be observable outside the function scope. Inline functions may have another behavior as the side effect might have to be observable outside the enclosing code.
inline int g (int a, int *v)
{
v[0] = a;
v[1] = 0;
while (v[0]-- > 0)
v[1] += v[0];
return v[1];
}
void h(void)
{
int x[2],y ;
g(y,x);
}
this code will be optimized to just a simple return
You can promise the compiler that nothing will happen to allow easier optimizations by using keyword restrict. But of course your code must keep this promise.
Upvotes: 0