Syntax_Error
Syntax_Error

Reputation: 6210

SSE4 inline assembly problems in C++

Hello Everyone I want to execute an inline assembly instruction that is of the following form

BLENDPD xmm1,xmm2/m128, imm8

I am new to inline assembly so i am having some difficulties. my code is:

#include<iostream>
using namespace std;
int main()
{
  long long y;
  __asm("blendpd %0,$0xabcd000000001111abcd000000001111,$0x1":
    "=r" (y):
    );
  cout<<y;
  return 0;
}

My first error was getting a 128 bit operand, so i used the long hex number , but still i need the output to be 128 bits since i want 2 be able to print y on the screen. and most of all i know my __asm syntax is wrong but can't figure out were, plus i'm not sure if compiling with Intel or AT&T syntax will make a difference when using the __asm.

Any help is welcome. Cheers! =)

Edit: I now have this version, and am getting an undefined function error.

  #include<iostream>
  #include<emmintrin.h>
  using namespace std;

int main()
{
const int mask=5;
__m128d v2 = _mm_set_pd(1.0, 2.0);
__m128d v1;
v1=_mm_blend_pd(v1, v2, mask);
return 0;
}

Upvotes: 2

Views: 2738

Answers (2)

Brooks Moses
Brooks Moses

Reputation: 9527

As an alternate answer to my other answer, here's how to do this with inline assembly rather than an intrinsic. (As Thomas Pornin notes on my other answer, intrinsics are generally better because they're more portable, but sometimes you want something like this too.)

First, I cheated -- I took the version with an intrinsic function, and compiled it with -S, and looked at the resulting assembly code, which is:

    movsd   -64(%rbp), %xmm0
    movhpd  -56(%rbp), %xmm0
    movsd   -48(%rbp), %xmm1
    movhpd  -40(%rbp), %xmm1
    blendpd $3, %xmm1, %xmm0
    movlpd  %xmm0, -64(%rbp)
    movhpd  %xmm0, -56(%rbp)

You can see here a few things different from your original code. First, note that the two 128-bit arguments are not immediates -- they're the xmm0 and xmm1 registers. Also, you've got the operands in the wrong order -- the mask goes first, and the register that contains the output goes last. Fix those, and the code compiles.

The second problem here is that you're storing the result from a general register into y, and the blendpd instruction doesn't touch general registers, so that's just storing garbage. You want the xmm0 register, which you get with =Yz (See GCC's documentation here). And you can't store that into a long long, which is 64 bits; you need a 128-bit vector variable. Solving all of those problems, the corrected code is:

#include<iostream>
#include<smmintrin.h>
using namespace std;
int main()
{
  __m128d y;
  __asm("blendpd $0x3,%%xmm1,%%xmm0":
    "=Yz" (y):
    );
  // cout<<y;
  return 0;
}

You'll note I had to comment out the cout line, as it has no provision for handling SSE vectors; you'd need to use the _mm_store_pd intrinsic to get the individual double values out of y first. (Or you could add more inline assembly to call the movhpd and movhld instructions to get the double values out of the register directly, rather than using a constraint to assign them to y.)

And there you have it -- that compiles and runs fine. Of course, the input values are undefined (whatever is randomly in those registers), so the output is garbage anyway -- you'd need to add something to load values into the registers first if you wanted to have a meaningful result.

Upvotes: 3

Brooks Moses
Brooks Moses

Reputation: 9527

First, for this sort of thing you very rarely need to use inline assembly. GCC generally provides "compiler intrinsic" functions which allow you to call a given special instruction using C function syntax rather than assembly syntax.

In this case, the intrinsic function you want is _mm_blend_pd(), and it has this function signature

#include <smmintrin.h>
__m128d _mm_blend_pd(__m128d v1, __m128d v2, const int mask);

The compiler will replace that with the single blendpd instruction; this is not actually a function call.

The __m128d data type is a vector containing two double-precision float values; you can create one from an array of doubles like so:

__m128d v = _mm_set_pd(1.0, 2.0);

To retrieve the values from a vector to print them, you can store the vector into an array of double-precision floats:

double a[2];
_mm_store_pd(a, v);

All of this is based on the Intel Intrinsics manual at http://www.info.univ-angers.fr/~richer/ens/l3info/ao/intel_intrinsics.pdf; although this refers to the Intel C++ compiler, GCC supports the same syntax.

Edit: Replaced erroneous emmintrin.h with correct smmintrin.h. Also, note that the mask value needs to be 2-bit (one bit per value in the vector); values other than 0, 1, 2, or 3 produce an error. And of course you need to compile this with the -msse4 GCC option.

Upvotes: 5

Related Questions