Reputation: 6210
Hello Everyone I want to execute an inline assembly instruction that is of the following form
BLENDPD xmm1,xmm2/m128, imm8
I am new to inline assembly so i am having some difficulties. my code is:
#include<iostream>
using namespace std;
int main()
{
long long y;
__asm("blendpd %0,$0xabcd000000001111abcd000000001111,$0x1":
"=r" (y):
);
cout<<y;
return 0;
}
My first error was getting a 128 bit operand, so i used the long hex number , but still i need the output to be 128 bits since i want 2 be able to print y on the screen. and most of all i know my __asm syntax is wrong but can't figure out were, plus i'm not sure if compiling with Intel or AT&T syntax will make a difference when using the __asm.
Any help is welcome. Cheers! =)
Edit: I now have this version, and am getting an undefined function error.
#include<iostream>
#include<emmintrin.h>
using namespace std;
int main()
{
const int mask=5;
__m128d v2 = _mm_set_pd(1.0, 2.0);
__m128d v1;
v1=_mm_blend_pd(v1, v2, mask);
return 0;
}
Upvotes: 2
Views: 2738
Reputation: 9527
As an alternate answer to my other answer, here's how to do this with inline assembly rather than an intrinsic. (As Thomas Pornin notes on my other answer, intrinsics are generally better because they're more portable, but sometimes you want something like this too.)
First, I cheated -- I took the version with an intrinsic function, and compiled it with -S
, and looked at the resulting assembly code, which is:
movsd -64(%rbp), %xmm0
movhpd -56(%rbp), %xmm0
movsd -48(%rbp), %xmm1
movhpd -40(%rbp), %xmm1
blendpd $3, %xmm1, %xmm0
movlpd %xmm0, -64(%rbp)
movhpd %xmm0, -56(%rbp)
You can see here a few things different from your original code. First, note that the two 128-bit arguments are not immediates -- they're the xmm0 and xmm1 registers. Also, you've got the operands in the wrong order -- the mask goes first, and the register that contains the output goes last. Fix those, and the code compiles.
The second problem here is that you're storing the result from a general register into y
, and the blendpd instruction doesn't touch general registers, so that's just storing garbage. You want the xmm0
register, which you get with =Yz
(See GCC's documentation here). And you can't store that into a long long
, which is 64 bits; you need a 128-bit vector variable. Solving all of those problems, the corrected code is:
#include<iostream>
#include<smmintrin.h>
using namespace std;
int main()
{
__m128d y;
__asm("blendpd $0x3,%%xmm1,%%xmm0":
"=Yz" (y):
);
// cout<<y;
return 0;
}
You'll note I had to comment out the cout
line, as it has no provision for handling SSE vectors; you'd need to use the _mm_store_pd
intrinsic to get the individual double values out of y
first. (Or you could add more inline assembly to call the movhpd
and movhld
instructions to get the double values out of the register directly, rather than using a constraint to assign them to y
.)
And there you have it -- that compiles and runs fine. Of course, the input values are undefined (whatever is randomly in those registers), so the output is garbage anyway -- you'd need to add something to load values into the registers first if you wanted to have a meaningful result.
Upvotes: 3
Reputation: 9527
First, for this sort of thing you very rarely need to use inline assembly. GCC generally provides "compiler intrinsic" functions which allow you to call a given special instruction using C function syntax rather than assembly syntax.
In this case, the intrinsic function you want is _mm_blend_pd(), and it has this function signature
#include <smmintrin.h>
__m128d _mm_blend_pd(__m128d v1, __m128d v2, const int mask);
The compiler will replace that with the single blendpd
instruction; this is not actually a function call.
The __m128d data type is a vector containing two double-precision float values; you can create one from an array of doubles like so:
__m128d v = _mm_set_pd(1.0, 2.0);
To retrieve the values from a vector to print them, you can store the vector into an array of double-precision floats:
double a[2];
_mm_store_pd(a, v);
All of this is based on the Intel Intrinsics manual at http://www.info.univ-angers.fr/~richer/ens/l3info/ao/intel_intrinsics.pdf; although this refers to the Intel C++ compiler, GCC supports the same syntax.
Edit: Replaced erroneous emmintrin.h
with correct smmintrin.h
. Also, note that the mask
value needs to be 2-bit (one bit per value in the vector); values other than 0, 1, 2, or 3 produce an error. And of course you need to compile this with the -msse4
GCC option.
Upvotes: 5