Reputation: 67
I'm trying to write a function in Assembly that will return the sum of the even elements in an array. This is my code
int sum(int *a, int n)
{
int S = 0;
_asm {
mov eax, [ebp + 8]
mov edx, 0
mov ebx, [ebp + 12]
mov ecx, 0
for1: cmp ecx, ebx
jge endfor
and [eax + ecx * 4], 1
jz even
inc ecx
jmp for1
even: add edx, [eax + ecx * 4]
inc ecx
jmp for1
endfor: mov S, edx
}
return S;
}
But it's not working. Does anybody know what it's wrong and how can I fix it? Thanks.
Upvotes: 1
Views: 1212
Reputation: 364842
Why even use inline asm at all if you want to grab the args off the stack yourself? Either let the compiler give you the args (so it still works after inlining), or write pure asm in a separate file.
Here's a version of your function written nicely. Note the indenting, and the more efficient code (branch structure, and loading into a register instead of comparing in memory and then using as a memory operand for add. If the condition was expected to be very unlikely, a test
or cmp
with a memory operand would make sense, though.)
int sum(const int *a, int n)
{
int S;
_asm {
mov ecx, n
xor eax,eax // Sum = 0
test ecx,ecx
jz .early_out // n==0 case
mov esi, a
lea edi, [esi + 4*ecx] // end pointer = &a[n]
xor ecx,ecx // zeroed register for CMOV
sum_loop: // do{
// Assume even/odd is unpredictable, so do it branchlessly:
mov edx, [esi]
test edx, 1 // ZF cleared for odd numbers only
cmovnz edx, ecx // zero out EDX for odd numbers only
add eax, edx // add zero or a[i]
add esi, 4
cmp esi, edi // while(++pointer < end_pointer)
jb sum_loop
early_out:
mov S, eax // MSVC does actually support leaving a value in EAX and falling off the end of a non-void function (without return),
// but clang -fasm-blocks doesn't. And there's no way to explicitly tell the compiler where the output(s) are in this dumb syntax, unlike GNU C inline asm
}
return S;
}
With a branch, the simple way is
.sum_loop:
mov edx, [esi]
test edx, 1
jnz .odd // conditionally skip the add
add eax, edx
.odd:
add esi, 4
cmp esi, edi // pointer < end pointer
jb .sum_loop
Your original branch structure (duplicating the tail of the loop for the odd/even branches) is a useful technique for some cases, but not here where they're identical again after the add to the total.
In asm, write loops with the conditional branch at the bottom, do{}while()
style. Check for conditions that imply zero iterations before entering the loop. You don't want a not-taken cmp
/jcc
and a taken unconditional branch.
lodsd
is just as efficient as mov eax, [esi]
/ add esi, 4
on Intel starting with Haswell. It's 3 uops on earlier Intel, and on AMD, so it saves code size at the expense of speed. In this case that would mean we couldn't have the sum in EAX, although that isn't important because the safe way to get data out of an asm block is to mov
-store it, not leave it in the return-value register. MSVC does seem to actually support the semantics of EAX being a return-value register even after inlining a function containing an asm{}
block, but I don't know if that's documented. And clang -fasm-blocks
certainly doesn't.
test dl, 1
would be slightly more efficient in the current code, fewer instruction bytes (3 instead of 6): there is no test r/m32, sign_extend_imm8
, only with a 32-bit immediate. So if you want to test a bit in the low byte, use the low-8-bit partial register. For simplicity I tested the same register we loaded into.
Loading into EAX would actually gain efficiency, allowing test al,1
as a 2-byte instruction instead of any other byte register being 3 bytes. Smaller code-size is typically better, all else equal, unless it happens to cause an unlucky alignment for some later code, e.g. the JCC erratum on Skylake if a macro-fused cmp/jcc
touches the end of a 32-byte block.
See the x86 tag wiki for guides and stuff, also https://stackoverflow.com/tags/inline-assembly/info
Upvotes: 0
Reputation: 20641
Just some guesswork (especially as I don't know what compiler you are using, and you weren't clear about what you meant by "not working"):
mov eax, [ebp + 8]
mov edx, 0
mov ebx, [ebp + 12]
mov ecx, 0
I assume this is supposed to load the two parameters a
and n
into registers. How do you know the offsets are correct? Is it possible to simply refer to the names directly?
and [eax + ecx * 4], 1
This destroys the element in the input array (setting it to 1 if odd or 0 if even), which you probably don't want. You should probably use the test
instruction (which is non-destructive) instead of and
.
even: add edx, [eax + ecx * 4]
This will add 0, since you have set [eax + ecx * 4]
to 0 via the and
instruction which I mention above. Based on this, I would expect your function to always return 0.
Upvotes: 4