Reputation: 4198
I am trying to reverse engineer a c code, but this part of assembly I cant really understand. I know it is part of the SSE extension. However, somethings are really different than what I am used to in x86 instructions.
static int sad16_sse2(void *v, uint8_t *blk2, uint8_t *blk1, int stride, int h)
{
int ret;
__asm__ volatile(
"pxor %%xmm6, %%xmm6 \n\t"
ASMALIGN(4)
"1: \n\t"
"movdqu (%1), %%xmm0 \n\t"
"movdqu (%1, %3), %%xmm1 \n\t"
"psadbw (%2), %%xmm0 \n\t"
"psadbw (%2, %3), %%xmm1 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"paddw %%xmm1, %%xmm6 \n\t"
"lea (%1,%3,2), %1 \n\t"
"lea (%2,%3,2), %2 \n\t"
"sub $2, %0 \n\t"
" jg 1b \n\t"
: "+r" (h), "+r" (blk1), "+r" (blk2)
: "r" ((x86_reg)stride)
);
__asm__ volatile(
"movhlps %%xmm6, %%xmm0 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"movd %%xmm6, %0 \n\t"
: "=r"(ret)
);
return ret;
}
What are the %1, %2, and %3? what does (%1,%2,%3) mean? Also what does "+r", "-r", "=r" mean?
Upvotes: 1
Views: 1137
Reputation: 30449
The inline assembler works similar to a macro preprocessor. Operands with exactly one leading percent are replaced by the the input parameters in the order as they appear in the parameter list, in this case:
%0 h output, register, r/w
%1 blk1 output, register, r/w
%2 blk2 output, register, r/w
%3 (x86_reg)stride input, register, read only
The parameters are normal C expressions. They can be further specified by "constraints", in this case "r" means the value should be in a register, opposed to "m" which is a memory operand. The constraint modifier "=r" makes this a write-only operand, "+r" is a read-write operand and "r" and normal read operand.
After the first colon the output operands appear, after the second the input operands and after the optional third the clobbered registers.
So the instruction sequence calculates the sum of the absolute differences in each byte of blk1
and blk2
. This happens in 16 byte blocks, so if stride
is 16, the blocks are contiguous, otherwise there are holes. Each instruction appears twice because some minimal loop unrolling is done, the h
parameter is the number of 32 byte blocks to process. The second asm block seems to be useless, as the psadbw
instruction sums up only in the low 16 bit of the destination register. (Did you omit some code?)
Upvotes: 0
Reputation: 137517
You'll want to have a look at this GCC Inline Asssembly HOWTO.
The percent sign numbers are the instruction operands.
Upvotes: 2