Reputation: 778
I have a very simple c++ code (a minimal example of what I am actually doing) using sse2 intrinsics.
#include <xmmintrin.h>
int main(){
__m128d a = {0,0};
__m128d b = {1,1};
__m128d c = a + b;
int t = c[0] >= 1;
return t;
}
I would like to check that the addition is indeed compiled to vectorized instructions. I compile the file with g++ -S test.cpp
My understanding of the thing is that if I don't put the msse2
flag to g++, sse2 is not enabled. It seems to be confirmed by the result of g++ -Q --help=target
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
However, when looking at the assembly code, the addpd
instruction seems to be used.
main:
.LFB499:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $80, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
pxor %xmm0, %xmm0
movaps %xmm0, -48(%rbp)
movapd .LC0(%rip), %xmm0
movaps %xmm0, -32(%rbp)
movapd -48(%rbp), %xmm0
addpd -32(%rbp), %xmm0
movaps %xmm0, -64(%rbp)
movsd -64(%rbp), %xmm0
pxor %xmm1, %xmm1
ucomisd %xmm1, %xmm0
setnb %al
movzbl %al, %eax
movl %eax, -68(%rbp)
movl -68(%rbp), %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L3
call __stack_chk_fail
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE499:
.size main, .-main
.section .rodata
.align 16
.LC0:
.long 0
.long 1072693248
.long 0
.long 1072693248
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
I see a contradiction here, which makes me think that there is something I don't understand. Is sse2 enabled or not?
Upvotes: 1
Views: 1395
Reputation: 365576
I can't repro your results.
x86-64 g++ does enable -msse
and -msse2
. You can disable SSE code-gen in 64-bit mode with -mno-sse
(even though SSE2 is baseline for x86-64), in which case gcc implements the +
operator with x87 fld
/ faddp
.
__m128d
is defined as a GNU C native vector with two double
elements, and you didn't use any intrinsics. If you'd used _mm_set_pd
or _mm_add_pd
instead of GNU-extension syntax which uses them as native vectors with {}
braced init lists and the +
operator, you'd get:
<source>:5:13: error: SSE register return with SSE disabled
__m128d c = _mm_add_pd(a, b);
The interesting thing is that even with SSE2 disabled, it will still parse xmmintrin.h
without error, but only at -O0
. With optimization enabled it notices there are all these (inline) functions that return in an SSE register with SSE disabled even if you don't call them.
You could work around that by defining a vector type yourself like
typedef double v2d __attribute__((vector_size(16)))
.
On the Godbolt compiler explorer, gcc8.2 -m32
is configured with SSE2 enabled by default (even though SSE2 is not baseline for 32-bit in general).
But gcc6.3 -m32 doesn't enable SSE2 by default, as you can see in the -Q --help=target
output.
No combination of anything I tried ever got gcc to emit addpd
when SSE2 was disabled (either explicitly or simply not enabled with -m32
). AFAIK, that would be a bug.
Upvotes: 1