Keeto
Keeto

Reputation: 4198

SSE2 instruction in C code

I am trying to reverse engineer a c code, but this part of assembly I cant really understand. I know it is part of the SSE extension. However, somethings are really different than what I am used to in x86 instructions.

static int sad16_sse2(void *v, uint8_t *blk2, uint8_t *blk1, int stride, int h)
{
    int ret;
    __asm__ volatile(
        "pxor %%xmm6, %%xmm6            \n\t"
        ASMALIGN(4)
        "1:                             \n\t"
        "movdqu (%1), %%xmm0            \n\t"
        "movdqu (%1, %3), %%xmm1        \n\t"
        "psadbw (%2), %%xmm0            \n\t"
        "psadbw (%2, %3), %%xmm1        \n\t"
        "paddw %%xmm0, %%xmm6           \n\t"
        "paddw %%xmm1, %%xmm6           \n\t"
        "lea (%1,%3,2), %1              \n\t"
        "lea (%2,%3,2), %2              \n\t"
        "sub $2, %0                     \n\t"
        " jg 1b                         \n\t"
        : "+r" (h), "+r" (blk1), "+r" (blk2)
        : "r" ((x86_reg)stride)
    );
    __asm__ volatile(
        "movhlps %%xmm6, %%xmm0         \n\t"
        "paddw   %%xmm0, %%xmm6         \n\t"
        "movd    %%xmm6, %0             \n\t"
        : "=r"(ret)
    );
    return ret;
}

What are the %1, %2, and %3? what does (%1,%2,%3) mean? Also what does "+r", "-r", "=r" mean?

Upvotes: 1

Views: 1137

Answers (2)

Gunther Piez
Gunther Piez

Reputation: 30449

The inline assembler works similar to a macro preprocessor. Operands with exactly one leading percent are replaced by the the input parameters in the order as they appear in the parameter list, in this case:

%0    h                output, register, r/w
%1    blk1             output, register, r/w
%2    blk2             output, register, r/w
%3    (x86_reg)stride  input, register, read only

The parameters are normal C expressions. They can be further specified by "constraints", in this case "r" means the value should be in a register, opposed to "m" which is a memory operand. The constraint modifier "=r" makes this a write-only operand, "+r" is a read-write operand and "r" and normal read operand.

After the first colon the output operands appear, after the second the input operands and after the optional third the clobbered registers.

So the instruction sequence calculates the sum of the absolute differences in each byte of blk1 and blk2. This happens in 16 byte blocks, so if stride is 16, the blocks are contiguous, otherwise there are holes. Each instruction appears twice because some minimal loop unrolling is done, the h parameter is the number of 32 byte blocks to process. The second asm block seems to be useless, as the psadbw instruction sums up only in the low 16 bit of the destination register. (Did you omit some code?)

Upvotes: 0

Jonathon Reinhart
Jonathon Reinhart

Reputation: 137517

You'll want to have a look at this GCC Inline Asssembly HOWTO.

The percent sign numbers are the instruction operands.

Upvotes: 2

Related Questions