Raphael D.
Raphael D.

Reputation: 778

Are SSE2 instructions enabled?

I have a very simple c++ code (a minimal example of what I am actually doing) using sse2 intrinsics.

#include <xmmintrin.h>
int main(){
    __m128d a = {0,0};
    __m128d b = {1,1};
    __m128d c = a + b;
    int t = c[0] >= 1;
    return t;
}

I would like to check that the addition is indeed compiled to vectorized instructions. I compile the file with g++ -S test.cpp

My understanding of the thing is that if I don't put the msse2 flag to g++, sse2 is not enabled. It seems to be confirmed by the result of g++ -Q --help=target

  -msse                             [disabled]
  -msse2                            [disabled]
  -msse2avx                         [disabled]
  -msse3                            [disabled]
  -msse4                            [disabled]
  -msse4.1                          [disabled]
  -msse4.2                          [disabled]
  -msse4a                           [disabled]

However, when looking at the assembly code, the addpd instruction seems to be used.

main:
.LFB499:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $80, %rsp
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    pxor    %xmm0, %xmm0
    movaps  %xmm0, -48(%rbp)
    movapd  .LC0(%rip), %xmm0
    movaps  %xmm0, -32(%rbp)
    movapd  -48(%rbp), %xmm0
    addpd   -32(%rbp), %xmm0
    movaps  %xmm0, -64(%rbp)
    movsd   -64(%rbp), %xmm0
    pxor    %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    setnb   %al
    movzbl  %al, %eax
    movl    %eax, -68(%rbp)
    movl    -68(%rbp), %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L3
    call    __stack_chk_fail
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE499:
    .size   main, .-main
    .section    .rodata
    .align 16
.LC0:
    .long   0
    .long   1072693248
    .long   0
    .long   1072693248
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

I see a contradiction here, which makes me think that there is something I don't understand. Is sse2 enabled or not?

Upvotes: 1

Views: 1395

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365576

I can't repro your results.

x86-64 g++ does enable -msse and -msse2. You can disable SSE code-gen in 64-bit mode with -mno-sse (even though SSE2 is baseline for x86-64), in which case gcc implements the + operator with x87 fld / faddp.

__m128d is defined as a GNU C native vector with two double elements, and you didn't use any intrinsics. If you'd used _mm_set_pd or _mm_add_pd instead of GNU-extension syntax which uses them as native vectors with {} braced init lists and the + operator, you'd get:

<source>:5:13: error: SSE register return with SSE disabled
     __m128d c = _mm_add_pd(a, b);

The interesting thing is that even with SSE2 disabled, it will still parse xmmintrin.h without error, but only at -O0. With optimization enabled it notices there are all these (inline) functions that return in an SSE register with SSE disabled even if you don't call them.

You could work around that by defining a vector type yourself like
typedef double v2d __attribute__((vector_size(16))).


On the Godbolt compiler explorer, gcc8.2 -m32 is configured with SSE2 enabled by default (even though SSE2 is not baseline for 32-bit in general).

But gcc6.3 -m32 doesn't enable SSE2 by default, as you can see in the -Q --help=target output.

No combination of anything I tried ever got gcc to emit addpd when SSE2 was disabled (either explicitly or simply not enabled with -m32). AFAIK, that would be a bug.

Upvotes: 1

Related Questions