Reputation: 2426
Consider the following two snippets:
#define ALIGN_BYTES 32
#define ASSUME_ALIGNED(x) x = __builtin_assume_aligned(x, ALIGN_BYTES)
void fn0(const float *restrict a0, const float *restrict a1,
float *restrict b, int n)
{
ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);
for (int i = 0; i < n; ++i)
b[i] = a0[i] + a1[i];
}
void fn1(const float *restrict *restrict a, float *restrict b, int n)
{
ASSUME_ALIGNED(a[0]); ASSUME_ALIGNED(a[1]); ASSUME_ALIGNED(b);
for (int i = 0; i < n; ++i)
b[i] = a[0][i] + a[1][i];
}
When I compile the function as gcc-4.7.2 -Ofast -march=native -std=c99 -ftree-vectorizer-verbose=5 -S test.c -Wall
I find that GCC inserts aliasing checks for the second function.
How can I prevent this such that the resulting assembly for fn1
is the same as that for fn0
? (When the number of parameters increases from three to, say, 30 the argument-passing approach (fn0
) becomes cumbersome and the number of aliasing checks in the fn1
approach becomes ridiculous .)
Assembly (x86-64, AVX capable chip); aliasing cruft at .LFB10
fn0:
.LFB9:
.cfi_startproc
testl %ecx, %ecx
jle .L1
movl %ecx, %r10d
shrl $3, %r10d
leal 0(,%r10,8), %r9d
testl %r9d, %r9d
je .L8
cmpl $7, %ecx
jbe .L8
xorl %eax, %eax
xorl %r8d, %r8d
.p2align 4,,10
.p2align 3
.L4:
vmovaps (%rsi,%rax), %ymm0
addl $1, %r8d
vaddps (%rdi,%rax), %ymm0, %ymm0
vmovaps %ymm0, (%rdx,%rax)
addq $32, %rax
cmpl %r8d, %r10d
ja .L4
cmpl %r9d, %ecx
je .L1
.L3:
movslq %r9d, %rax
salq $2, %rax
addq %rax, %rdi
addq %rax, %rsi
addq %rax, %rdx
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L6:
vmovss (%rsi,%rax,4), %xmm0
vaddss (%rdi,%rax,4), %xmm0, %xmm0
vmovss %xmm0, (%rdx,%rax,4)
addq $1, %rax
leal (%r9,%rax), %r8d
cmpl %r8d, %ecx
jg .L6
.L1:
vzeroupper
ret
.L8:
xorl %r9d, %r9d
jmp .L3
.cfi_endproc
.LFE9:
.size fn0, .-fn0
.p2align 4,,15
.globl fn1
.type fn1, @function
fn1:
.LFB10:
.cfi_startproc
testq %rdx, %rdx
movq (%rdi), %r8
movq 8(%rdi), %r9
je .L12
leaq 32(%rsi), %rdi
movq %rdx, %r10
leaq 32(%r8), %r11
shrq $3, %r10
cmpq %rdi, %r8
leaq 0(,%r10,8), %rax
setae %cl
cmpq %r11, %rsi
setae %r11b
orl %r11d, %ecx
cmpq %rdi, %r9
leaq 32(%r9), %r11
setae %dil
cmpq %r11, %rsi
setae %r11b
orl %r11d, %edi
andl %edi, %ecx
cmpq $7, %rdx
seta %dil
testb %dil, %cl
je .L19
testq %rax, %rax
je .L19
xorl %ecx, %ecx
xorl %edi, %edi
.p2align 4,,10
.p2align 3
.L15:
vmovaps (%r9,%rcx), %ymm0
addq $1, %rdi
vaddps (%r8,%rcx), %ymm0, %ymm0
vmovaps %ymm0, (%rsi,%rcx)
addq $32, %rcx
cmpq %rdi, %r10
ja .L15
cmpq %rax, %rdx
je .L12
.p2align 4,,10
.p2align 3
.L20:
vmovss (%r9,%rax,4), %xmm0
vaddss (%r8,%rax,4), %xmm0, %xmm0
vmovss %xmm0, (%rsi,%rax,4)
addq $1, %rax
cmpq %rax, %rdx
ja .L20
.L12:
vzeroupper
ret
.L19:
xorl %eax, %eax
jmp .L20
.cfi_endproc
Upvotes: 35
Views: 1875
Reputation: 81169
Nearly all of the performance advantages that could be reaped via the use of restrict involve one of two usage patterns:
A restrict qualifier applied directly to a named function argument [as opposed to something pointed to thereby]
A restrict qualifier applied directly to a named automatic object which has an initializer.
In both of those contexts, it would be clear that the qualifier "guards" storage accessed by pointers based upon the initial value of the named object, and that the term of such guarding extends from the time the object is initialized until the end of its lifetime.
If a restrict qualifier is used in any other circumstance, it's far less clear what the semantics should be. While the Standard attempts to specify how other types should work, I'm unaware of any compilers trying to apply them.
Given, for example:
extern int x,y;
int *xx = &x, *yy = &y;
int *restrict *restrict pp;
pp = &xx;
int *q = *pp;
*q = 1;
pp = &yy;
... other code
If q
is never used after the *q=1;
shown above, should the "restrict" qualifier on *pp
continue to guard x
even after pp
itself is changed to point to yy
. Is there any evidence that the Committee has considered such issues and reached any consensus, or that compiler writers attempt to meaningfully handle such cases?
Meaningful handling of the restrict
qualifier requires that the "guarded pointer value" established thereby has a well-defined lifetime. Trying to handle cases beyond the two described above would require substantial effort while offering relatively minimal benefit.
If the example code were changed to use a declaration int *restrict q = *pp;
, then it would be clear that the value of x
would be protected in "other code" if it was within the scope of q
, but that would be true regardless of whether the compiler recognized the outer-level restrict
qualifier on pp
. So why bother with such complications?
Upvotes: 0
Reputation: 5642
I apologize in advance, because I cannot reproduce results with GCC 4.7 on my machine, but there are two possible solutions.
Use typedef to compose a * restrict * restrict
properly. This is,
according to a former colleague who developers the LLVM compiler,
the single exception to typedef
behaving like the preprocessor in
C and it exists to allow the anti-aliasing behavior you desire.
I attempted this below but I'm not sure I succeeded. Please fact-check my attempt carefully.
Use the syntax described in the answers to using restrict qualifier with C99 variable length arrays (VLAs).
I attempted this below but I'm not sure I succeeded. Please fact-check my attempt carefully.
Here is the code I used to perform my experiments, but I was not able to determine conclusively if either of my suggestions worked as desired.
#define ALIGN_BYTES 32
#define ASSUME_ALIGNED(x) x = __builtin_assume_aligned(x, ALIGN_BYTES)
void fn0(const float *restrict a0, const float *restrict a1,
float *restrict b, int n)
{
ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);
for (int i = 0; i < n; ++i)
b[i] = a0[i] + a1[i];
}
#if defined(ARRAY_RESTRICT)
void fn1(const float *restrict a[restrict], float * restrict b, int n)
#elif defined(TYPEDEF_SOLUTION)
typedef float * restrict frp;
void fn1(const frp *restrict a, float *restrict b, int n)
#else
void fn1(const float *restrict *restrict a, float *restrict b, int n)
#endif
{
//ASSUME_ALIGNED(a[0]); ASSUME_ALIGNED(a[1]); ASSUME_ALIGNED(b);
for (int i = 0; i < n; ++i)
b[i] = a[0][i] + a[1][i];
}
Upvotes: 0
Reputation: 3991
There is away to tell compiler to stop checking aliasing:
please add line:
#pragma GCC ivdep
right in front of the loop you want to vectorize, if you need more information please read:
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Loop-Specific-Pragmas.html
Upvotes: 1
Reputation: 6555
Well, what about the flag
-fno-strict-aliasing
?
As I understood you right you just want to know how to turn this checks off? If thats all, this parameter to gcc commandline should be helping you.
EDIT:
In addition to your comment: isn't it forbidden to use const type restrict pointers?
this is from ISO/IEC 9899 (6.7.3.1 Formal definition of restrict):
1.
Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.
4.
During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.
And a much more interesting point, same as with register is this one:
6.
A translator is free to ignore any or all aliasing implications of uses of restrict.
So if you can't find a command parameter which forces gcc to do so, its probably not possible, because from the standard it doesn't have to give the option to do so.
Upvotes: 0
Reputation: 5163
Can this help?
void fn1(const float **restrict a, float *restrict b, int n)
{
const float * restrict a0 = a[0];
const float * restrict a1 = a[1];
ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);
for (int i = 0; i < n; ++i)
b[i] = a0[i] + a1[i];
}
Edit: second try :). With information from http://locklessinc.com/articles/vectorize/
gcc --fast-math ...
Upvotes: 0