SSE4 inline assembly problems in C++

Question

Hello Everyone I want to execute an inline assembly instruction that is of the following form

BLENDPD xmm1,xmm2/m128, imm8

I am new to inline assembly so i am having some difficulties. my code is:

#include
using namespace std;
int main()
{
  long long y;
  __asm("blendpd %0,$0xabcd000000001111abcd000000001111,$0x1":
    "=r" (y):
    );
  cout<



My first error was getting a 128 bit operand, so i used the long hex number , but still i need the output to be 128 bits since i want 2 be able to print y on the screen. and most of all i know my __asm syntax is wrong but can't figure out were, plus i'm not sure if compiling with Intel or AT&T syntax will make a difference when using the __asm.

Any help is welcome.  Cheers!  =)

Edit: I now have this version, and am getting an undefined function error.

  #include
  #include
  using namespace std;

int main()
{
const int mask=5;
__m128d v2 = _mm_set_pd(1.0, 2.0);
__m128d v1;
v1=_mm_blend_pd(v1, v2, mask);
return 0;
}

Brooks Moses · Accepted Answer

As an alternate answer to my other answer, here's how to do this with inline assembly rather than an intrinsic. (As Thomas Pornin notes on my other answer, intrinsics are generally better because they're more portable, but sometimes you want something like this too.)

First, I cheated -- I took the version with an intrinsic function, and compiled it with -S, and looked at the resulting assembly code, which is:

    movsd   -64(%rbp), %xmm0
    movhpd  -56(%rbp), %xmm0
    movsd   -48(%rbp), %xmm1
    movhpd  -40(%rbp), %xmm1
    blendpd $3, %xmm1, %xmm0
    movlpd  %xmm0, -64(%rbp)
    movhpd  %xmm0, -56(%rbp)

You can see here a few things different from your original code. First, note that the two 128-bit arguments are not immediates -- they're the xmm0 and xmm1 registers. Also, you've got the operands in the wrong order -- the mask goes first, and the register that contains the output goes last. Fix those, and the code compiles.

The second problem here is that you're storing the result from a general register into y, and the blendpd instruction doesn't touch general registers, so that's just storing garbage. You want the xmm0 register, which you get with =Yz (See GCC's documentation here). And you can't store that into a long long, which is 64 bits; you need a 128-bit vector variable. Solving all of those problems, the corrected code is:

#include
#include
using namespace std;
int main()
{
  __m128d y;
  __asm("blendpd $0x3,%%xmm1,%%xmm0":
    "=Yz" (y):
    );
  // cout<



You'll note I had to comment out the cout line, as it has no provision for handling SSE vectors; you'd need to use the _mm_store_pd intrinsic to get the individual double values out of y first.  (Or you could add more inline assembly to call the movhpd and movhld instructions to get the double values out of the register directly, rather than using a constraint to assign them to y.)

And there you have it -- that compiles and runs fine.  Of course, the input values are undefined (whatever is randomly in those registers), so the output is garbage anyway -- you'd need to add something to load values into the registers first if you wanted to have a meaningful result.

SSE4 inline assembly problems in C++

Answers (2)

Related Questions