Freddie Witherden
Freddie Witherden

Reputation: 2426

GCC Aliasing Checks w/Restrict pointers

Consider the following two snippets:

#define ALIGN_BYTES 32
#define ASSUME_ALIGNED(x) x = __builtin_assume_aligned(x, ALIGN_BYTES)

void fn0(const float *restrict a0, const float *restrict a1,
         float *restrict b, int n)
{
    ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);

    for (int i = 0; i < n; ++i)
        b[i] = a0[i] + a1[i];
}

void fn1(const float *restrict *restrict a, float *restrict b, int n)
{
    ASSUME_ALIGNED(a[0]); ASSUME_ALIGNED(a[1]); ASSUME_ALIGNED(b);

    for (int i = 0; i < n; ++i)
        b[i] = a[0][i] + a[1][i];
}

When I compile the function as gcc-4.7.2 -Ofast -march=native -std=c99 -ftree-vectorizer-verbose=5 -S test.c -Wall I find that GCC inserts aliasing checks for the second function.

How can I prevent this such that the resulting assembly for fn1 is the same as that for fn0? (When the number of parameters increases from three to, say, 30 the argument-passing approach (fn0) becomes cumbersome and the number of aliasing checks in the fn1 approach becomes ridiculous .)

Assembly (x86-64, AVX capable chip); aliasing cruft at .LFB10

fn0:
.LFB9:
    .cfi_startproc
    testl   %ecx, %ecx
    jle .L1
    movl    %ecx, %r10d
    shrl    $3, %r10d
    leal    0(,%r10,8), %r9d
    testl   %r9d, %r9d
    je  .L8
    cmpl    $7, %ecx
    jbe .L8
    xorl    %eax, %eax
    xorl    %r8d, %r8d
    .p2align 4,,10
    .p2align 3
.L4:
    vmovaps (%rsi,%rax), %ymm0
    addl    $1, %r8d
    vaddps  (%rdi,%rax), %ymm0, %ymm0
    vmovaps %ymm0, (%rdx,%rax)
    addq    $32, %rax
    cmpl    %r8d, %r10d
    ja  .L4
    cmpl    %r9d, %ecx
    je  .L1
.L3:
    movslq  %r9d, %rax
    salq    $2, %rax
    addq    %rax, %rdi
    addq    %rax, %rsi
    addq    %rax, %rdx
    xorl    %eax, %eax
    .p2align 4,,10
    .p2align 3
.L6:
    vmovss  (%rsi,%rax,4), %xmm0
    vaddss  (%rdi,%rax,4), %xmm0, %xmm0
    vmovss  %xmm0, (%rdx,%rax,4)
    addq    $1, %rax
    leal    (%r9,%rax), %r8d
    cmpl    %r8d, %ecx
    jg  .L6
.L1:
    vzeroupper
    ret
.L8:
    xorl    %r9d, %r9d
    jmp .L3
    .cfi_endproc
.LFE9:
    .size   fn0, .-fn0
    .p2align 4,,15
    .globl  fn1
    .type   fn1, @function
fn1:
.LFB10:
    .cfi_startproc
    testq   %rdx, %rdx
    movq    (%rdi), %r8
    movq    8(%rdi), %r9
    je  .L12
    leaq    32(%rsi), %rdi
    movq    %rdx, %r10
    leaq    32(%r8), %r11
    shrq    $3, %r10
    cmpq    %rdi, %r8
    leaq    0(,%r10,8), %rax
    setae   %cl
    cmpq    %r11, %rsi
    setae   %r11b
    orl %r11d, %ecx
    cmpq    %rdi, %r9
    leaq    32(%r9), %r11
    setae   %dil
    cmpq    %r11, %rsi
    setae   %r11b
    orl %r11d, %edi
    andl    %edi, %ecx
    cmpq    $7, %rdx
    seta    %dil
    testb   %dil, %cl
    je  .L19
    testq   %rax, %rax
    je  .L19
    xorl    %ecx, %ecx
    xorl    %edi, %edi
    .p2align 4,,10
    .p2align 3
.L15:
    vmovaps (%r9,%rcx), %ymm0
    addq    $1, %rdi
    vaddps  (%r8,%rcx), %ymm0, %ymm0
    vmovaps %ymm0, (%rsi,%rcx)
    addq    $32, %rcx
    cmpq    %rdi, %r10
    ja  .L15
    cmpq    %rax, %rdx
    je  .L12
    .p2align 4,,10
    .p2align 3
.L20:
    vmovss  (%r9,%rax,4), %xmm0
    vaddss  (%r8,%rax,4), %xmm0, %xmm0
    vmovss  %xmm0, (%rsi,%rax,4)
    addq    $1, %rax
    cmpq    %rax, %rdx
    ja  .L20
.L12:
    vzeroupper
    ret
.L19:
    xorl    %eax, %eax
    jmp .L20
    .cfi_endproc

Upvotes: 35

Views: 1875

Answers (5)

supercat
supercat

Reputation: 81169

Nearly all of the performance advantages that could be reaped via the use of restrict involve one of two usage patterns:

  1. A restrict qualifier applied directly to a named function argument [as opposed to something pointed to thereby]

  2. A restrict qualifier applied directly to a named automatic object which has an initializer.

In both of those contexts, it would be clear that the qualifier "guards" storage accessed by pointers based upon the initial value of the named object, and that the term of such guarding extends from the time the object is initialized until the end of its lifetime.

If a restrict qualifier is used in any other circumstance, it's far less clear what the semantics should be. While the Standard attempts to specify how other types should work, I'm unaware of any compilers trying to apply them.

Given, for example:

extern int x,y;

int *xx = &x, *yy = &y;
int *restrict *restrict pp;

pp = &xx;
int *q = *pp;
*q = 1;
pp = &yy;
... other code

If q is never used after the *q=1; shown above, should the "restrict" qualifier on *pp continue to guard x even after pp itself is changed to point to yy. Is there any evidence that the Committee has considered such issues and reached any consensus, or that compiler writers attempt to meaningfully handle such cases?

Meaningful handling of the restrict qualifier requires that the "guarded pointer value" established thereby has a well-defined lifetime. Trying to handle cases beyond the two described above would require substantial effort while offering relatively minimal benefit.

If the example code were changed to use a declaration int *restrict q = *pp;, then it would be clear that the value of x would be protected in "other code" if it was within the scope of q, but that would be true regardless of whether the compiler recognized the outer-level restrict qualifier on pp. So why bother with such complications?

Upvotes: 0

Jeff Hammond
Jeff Hammond

Reputation: 5642

I apologize in advance, because I cannot reproduce results with GCC 4.7 on my machine, but there are two possible solutions.

  1. Use typedef to compose a * restrict * restrict properly. This is, according to a former colleague who developers the LLVM compiler, the single exception to typedef behaving like the preprocessor in C and it exists to allow the anti-aliasing behavior you desire.

    I attempted this below but I'm not sure I succeeded. Please fact-check my attempt carefully.

  2. Use the syntax described in the answers to using restrict qualifier with C99 variable length arrays (VLAs).

    I attempted this below but I'm not sure I succeeded. Please fact-check my attempt carefully.

Here is the code I used to perform my experiments, but I was not able to determine conclusively if either of my suggestions worked as desired.

#define ALIGN_BYTES 32
#define ASSUME_ALIGNED(x) x = __builtin_assume_aligned(x, ALIGN_BYTES)

void fn0(const float *restrict a0, const float *restrict a1,
         float *restrict b, int n)
{
    ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);

    for (int i = 0; i < n; ++i)
        b[i] = a0[i] + a1[i];
}

#if defined(ARRAY_RESTRICT)
void fn1(const float *restrict a[restrict], float * restrict b, int n)
#elif defined(TYPEDEF_SOLUTION)
typedef float * restrict frp;
void fn1(const frp *restrict a, float *restrict b, int n)
#else
void fn1(const float *restrict *restrict a, float *restrict b, int n)
#endif
{
    //ASSUME_ALIGNED(a[0]); ASSUME_ALIGNED(a[1]); ASSUME_ALIGNED(b);

    for (int i = 0; i < n; ++i)
        b[i] = a[0][i] + a[1][i];
}

Upvotes: 0

PhD AP EcE
PhD AP EcE

Reputation: 3991

There is away to tell compiler to stop checking aliasing:

please add line:

#pragma GCC ivdep

right in front of the loop you want to vectorize, if you need more information please read:

https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Loop-Specific-Pragmas.html

Upvotes: 1

dhein
dhein

Reputation: 6555

Well, what about the flag

-fno-strict-aliasing

?

As I understood you right you just want to know how to turn this checks off? If thats all, this parameter to gcc commandline should be helping you.

EDIT:

In addition to your comment: isn't it forbidden to use const type restrict pointers?

this is from ISO/IEC 9899 (6.7.3.1 Formal definition of restrict):

1.

Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.

4.

During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

And a much more interesting point, same as with register is this one:

6.

A translator is free to ignore any or all aliasing implications of uses of restrict.

So if you can't find a command parameter which forces gcc to do so, its probably not possible, because from the standard it doesn't have to give the option to do so.

Upvotes: 0

Valeri Atamaniouk
Valeri Atamaniouk

Reputation: 5163

Can this help?

void fn1(const float **restrict a, float *restrict b, int n)
{
    const float * restrict a0 = a[0];
    const float * restrict a1 = a[1];

    ASSUME_ALIGNED(a0); ASSUME_ALIGNED(a1); ASSUME_ALIGNED(b);

    for (int i = 0; i < n; ++i)
        b[i] = a0[i] + a1[i];
}

Edit: second try :). With information from http://locklessinc.com/articles/vectorize/

gcc --fast-math ...

Upvotes: 0

Related Questions