alexpbell
alexpbell

Reputation: 85

Why doesn't this C vector loop auto-vectorise?

I am trying to optimise some code by use of AVX intrinsics. A very simple test case compiles but tells me that my loop was not vectorised for a number of reasons that I don't understand.

This is the full program, simple.c

#include <math.h>
#include <stdlib.h>
#include <assert.h>
#include <immintrin.h>

int main(void)
{

  __m256 * x = (__m256 *) calloc(1024,sizeof(__m256));    

  for (int j=0;j<32;j++)
    x[j] = _mm256_set1_ps(1.); 

  return(0);
}

This is the command line: gcc simple.c -O1 -fopenmp -ffast-math -lm -mavx2 -ftree-vectorize -fopt-info-vec-missed

This is the output:

I have gcc version 5.4.

Can anyone help me to interpret these messages and to understand what is going on?

Upvotes: 3

Views: 605

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364029

You're already manually vectorizing with intrinsics, so there's nothing left for gcc to auto-vectorize. This leads to uninteresting warnings, I assume from trying to auto-vectorize the intrinsic or the loop-counter increments.

I get good asm from gcc 5.3 (on the Godbolt compiler explorer) if I don't do something silly like write a function that will optimize away, or try to compile it with only -O1.

#include <immintrin.h>

void set_to_1(__m256 * x) {
  for (int j=0;j<32;j++)
    x[j] = _mm256_set1_ps(1.); 
}

    push    rbp
    lea     rax, [rdi+1024]
    vmovaps ymm0, YMMWORD PTR .LC0[rip]
    mov     rbp, rsp
    push    r10                      # gcc is weird with r10 in functions with ymm vectors
.L2:                                 # this is the vector loop
    vmovaps YMMWORD PTR [rdi], ymm0
    add     rdi, 32
    cmp     rdi, rax
    jne     .L2
    vzeroupper
    pop     r10
    pop     rbp
    ret

.LC0:
    .long   1065353216
    ... repeated several times because gcc failed to use a vbroadcastss load or generate the constant on the fly

I do actually get nearly the same asm from -O1, but using -O1 to not optimize things away isn't a good way to see what gcc will really do.

Upvotes: 3

Related Questions