Reputation: 1864
I am trying to annotate a disassembly block for exam practice. Here's what I have done so far:
00000190 <mystery>:
190: 2300 movs r3, #0 // move address 190 (offset 0) into r3 ?
192: e004 b.n 19e <mystery+0xe> // if 19e then branch to mystery
194: f010 0f01 tst.w r0, #1 ; 0x1 // update flags to 1 in status register
198: bf18 it ne // if 198 not equal to ??? then ???
19a: 3301 addne r3, #1 // add to r3 if not equal to 19a offset 1?
19c: 1040 asrs r0, r0, #1 // shift r0 right one spot (leave it in r0)
19e: 2800 cmp r0, #0 // compare contents of r0 against 0 ?
1a0: d1f8 bne.n 194 <mystery+0x4> // branch to 194 if not equal to something at line 194?
1a2: 4618 mov r0, r3 // move r3 wholecloth into r0
1a4: 4770 bx lr // branch(return from the mystery function)
1a6: bf00 nop // No operation
So my comments are pretty rudimentary and likely to be massively incorrect but most of all I really don't understand what instructions such as those at 190 or 19a mean. There are only two arguments instead of three, so how do these work?
Taking as an example
19a: 3301 addne r3, #1
My interpretation of this so far is: if not equal to X, then add Y to r3? What are X and Y? Should I be using the result from the previous line? If so, which argument (of the standard three) does it take the place of?
Blah!
I am willing to accept that I have no idea what I am doing and am completely misinterpreting everything.
Please send help!
Upvotes: 3
Views: 4782
Reputation: 71536
If you look at the ARM ARM (ARM Architectural Reference Manual) it has a section closer to the front about the flags. Unlike many other instruction sets, if you look at the ARM instructions, the ARM flavor in particular (not thumb) the top four bits of every instruction are conditional bits. Unlike most other processors, with the arm you can conditionally execute any instruction, most others only allow for conditional branches. The condition codes, ne, nz, cs, nc, etc are listed in that early section on the condition codes. So an add if the zero flag is clear would be addne. Also unlike most other processors ARM (in arm mode) allows for you to choose when you want to destroy/write the flags. Most others would always update the flags on an add for example, arm only does if you add the s, add does not adds does. It gets tricky when you combine the conditional execution and these other modifiers to the instruction, for example is it addsne or addnes? that takes trial and error to figure out. I would guess addnes, but I use combinations like that so rarely that I dont have it memorized.
As already mentioned the disassembler creates something that is not assembleable, there are additional items on the output to help you decode the instruction.
It looks like you are looking at thumb2 code, which is a frankenstein mixture of ARM and thumb. So you are going to have some arm features and some thumb features and at least with binutils some annoying binutils-isms (dont have an arm toolchain anymore to compare). For example even though we know that many thumb instructions modify the flags without it being an option, and the disassembler shows this by giving adds instead of add, you cannot use adds r1,r2 for thumb mode as it complains, it wants you to use add r1,r2 even though you are modifying the flag. ARM is working to push a unified arm/thumb assembly syntax, which probably already works with their toolchain but will have to see what happens with the gnu tools.
So I wouldnt expect to be able to take the disassembly output and re-assemble that syntax for those two reasons. The extra stuff is there to help you understand the specific instruction that was encoded.
Upvotes: 2
Reputation: 25278
1) TST instruction is basically the same as ANDS, except it doesn't change the first operand. So, TST r0, #1
sets flags based on the result of (r0 & 1). Specifically, it will set the Z (zero) flag if the result was zero, i.e. bit 0 of r0 was not set.
2) IT stands for "If-Then". It checks the condition indicated, and conditionally executes up to 4 following instructions. In your example you have only one conditional instruction, which the disassembler helpfully provided with the NE suffix from the IT instruction (the suffix is not encoded in the instruction itself for Thumb-2). NE means "not equal", but in this case there was no comparison, so what gives? The trick is that the equality check checks the Z flag, so you can think of this one as "not Zero". So, our ADD will be executed in case the Z flag was not set, i.e. r0 did have bit 0 set.
3) A similar situation happens around CMP/BNE. CMP basically subtracts operands and sets the flags based on the result. In our case, it will set Z if r0 was equal to 0. Next, BNE will test the Z flag and branch if it was not set (i.e. r0 was not equal to 0).
Converting it all to pseudo-C, we get:
r3 = 0
goto test_loop;
loop:
Z = (r0 & 1) == 0;
if (!Z)
r3 += 1;
r0 = r0 >> 1
test_loop:
Z = (r0 - 0) == 0;
if (!Z) goto loop;
r0 = r3;
return;
Or, in "normal" C:
r3 = 0;
while ( r0 != 0 )
{
if ( r0 & 1 )
r3++;
r0 >>= 1;
}
return r3;
Looks like it's counting bits in r0.
Have a look here for the table of condition codes and what flags they check. This describes how and when the flags are set.
Edit: I just reread your question and realized one source of your confusion. In line like this:
b.n 19e <mystery+0xe>
there is one operand, not two. The disassembler tries to be helpful and shows not just the absolute destination address (19e) but also its representation as an offset from the nearest symbol (mystery is at 190, so 19e is mystery+0xe).
Another thing you need to realize is that in ARM (and many other processors), setting flags and using flags is usually done in separate instructions. That's why you first do TST or CMP (or other flag-setting instruction), and then use conditional instructions, IT, or conditional branches.
Upvotes: 3
Reputation: 60681
190: 2300 movs r3, #0 // assign the value 0 to R3, affecting
// the status flags (the S suffix)
19a: 3301 addne r3, #1 // add 1 to r3 IF the previous comparison was
// Not Equal to 0
The ne
suffix checks the status flags which were earlier set by the movs
instruction.
Upvotes: 3