arunmoezhi
arunmoezhi

Reputation: 3200

assembly intrinsic to do a masked load

int main()
{
    const int STRIDE=2,SIZE=8192;
    int i=0;
    double u[SIZE][STRIDE]; 
    #pragma vector aligned
    for(i=0;i<SIZE;i++)
    {
        u[i][STRIDE-1]= i;
    }
    printf("%lf\n",u[7][STRIDE-1]);
    return 0;
}

The compiler uses xmm registers here. There is stride 2 access and I want to make the compiler ignore this and do a regular load of memory and then mask alternate bits so I would be using 50% of the SIMD registers. I need intrinsics which can be used to load and then mask the register bitwise before storing back to memory

P.S: I have never done assembly coding before

Upvotes: 2

Views: 2166

Answers (3)

jilles
jilles

Reputation: 11252

Without AVX, half a SIMD register is only one double anyway, so there seems little wrong with regular 64-bit stores.

If you want to use masked stores (MASKMOVDQU/MASKMOVQ), note that they write directly to DRAM just like the non-temporal stores like MOVNTPS. This may or may not be what you want. If the data fits in cache and you plan to read it soon, it is likely better not to use them.

Certain AMD processors can do a 64-bit non-temporal store from an XMM register using MOVNTSD; this may simplify things slightly compared to MASKMOVDQU).

Upvotes: 0

arunmoezhi
arunmoezhi

Reputation: 3200

A masked store with a mask value as 0xAA (10101010)

Upvotes: 2

Brendan
Brendan

Reputation: 37222

You can't do a masked load (only a masked store). The easiest alternative would be to do a load and then mask it yourself (e.g. using intrinsics).

A potentially better alternative would be to change your array to "double u[STRIDE][SIZE];" so that you don't need to mask anything and don't end up with half an XMM register wasted/masked.

Upvotes: 0

Related Questions