Reputation: 969
How is the Auxiliary Flag calculated in x86 Assembly?
The majority of the resources I can find explain that, the Auxiliary Flag is set to '1' if there is a carry from bit 3 to bit 4.
It indicates when a carry or borrow has been generated out of the least significant four bits of the accumulator register following the execution of an arithmetic instruction.
Example:
mov al,-14 *(1111 0010)
mov bl,-130 (0111 1110)
sub al,bl (1111 0010 – 0111 1110)
* the brackets show the stored binary patterns.
Result: 1111 0010 – 0111 1110
will be calculated as 1111 0010 + 1000 0010
using two's complement, giving the result 0111 0100 + OF
.
In the example given the AF is set (=1). I do not understand why this is, as I cannot see that there has been a carry from bit 3 to bit 4. The addition of 0010
+ 0010
, the least significant nibble, equals 0100
, no carry. The least significant four bits of the accumulator register, have changed from 0010
to 0100
('2' to '4'), there has been no carry from the lower nibble to the higher nibble?
Please could someone kindly explain where my thinking has gone awry?
I have a suspicion that the abundance of 'negatives' is throwing me off at some point, as I have tried several different examples in the debugger and they all act in accordance with my expectations, bar this one example.
Upvotes: 0
Views: 2960
Reputation: 16586
The sub
instruction on x86 CPUs is "real" instruction since the first chip 8086, i.e. it's not some kind of assembler convenience, which gets translated as negation + add, but it has it's own binary opcode and the CPU itself will be aware it should produce result of subtraction.
That instruction has definition from Intel, how it does affect flags, and the flags in this case are modified "as if" real subtraction is calculated. That's all you need to know when you are focusing on programming algorithm or reviewing correctness of some code. Whether the chip itself implements it as addition, and has some extra transistors converting flags to the "subtraction" variant, is "implementation detail", and as long as you want to know only result, it's not important.
The implementation details become important while you are tuning particular piece of code for performance, then considering the inner architecture of the chip and implementation of particular opcodes may give you ideas how rewrite particular code in somewhat more unintuitive/non-human way, often even with more instructions than "naive" version, but the performance will be better, due to better exploitation of the inner implementation of the chip.
But the result is well defined, and can't change by some implementation detail, that would be "bug in CPU", like the first Pentium chips did calculate wrong results for certain divisions.
That said the definitions of assembly instructions are already leaking implementation details like no other language, because the assembly instructions while designed are half-way on the path "what is simple to create in HW transistors" and half-way "what makes some programming sense", while other higher level programming languages are lot more biased toward "what makes sense", only reluctantly imposing some cumbersome limits from the HW implementation, like for example value ranges for particular bit-size of variable types.
So being curious about the implementation and why certain things are defined as they are (like for example why dec xxx
does NOT update CF flag, while otherwise it is just sub xxx,1
) will often give you new insights into how certain tasks can be written more effectively in assembly and how chips did evolve and which tasks are easier to compute than others.
But basics first. The sub
instruction updates flags as if subtraction was calculated, and the sub
instruction is not aware of any context of the values it is processing, all it gets is just the binary patterns of the values, in your case: 1111_0010 – 0111_1110
which is when interpreted in signed 8bit math "-14 - +126" (-130 doesn't fit into 8 bits, so it got truncated to +126, good assembler will emit warning/error there), or when interpreted in unsigned 8b math "242 - 126". In case of signed math the result should be -140, which gets truncated (overflow happens, OF=1) to 8b value +116, in case of unsigned math the result is +116 without unsigned overflow (carry/borrow CF=0).
The subtraction itself is well defined per-bit, i.e.
1111_0010
– 0111_1110
___________
result: 0111_0100
borrow: 0111_1100
^ this borrow goes to AF
^ the last borrow goes to CF
^ the last result bit goes to SF
All zero result bits sets ZF=1
PF is calculated from only low 8 bits of result (even with 32b registers!)
where PF=1 means there was even number of set bits, like here 4.
You can go from right to left, and do per-bit subtractions, i.e. 0-0=0, 1-1=0, 0-1=1+b, 0-2=0+b, etc.. (where +b signals need of "borrow", i.e. the first operand got borrowed +2 (+1 in next bit) to make the result valid bit value +0 or +1)
BTW how exactly is OF set on bit level is a bit more tricky, there's some nice Q+A here on SO, you can search for it, but from math point of view, if the result gets "truncated" in signed interpretation (like in this example), then OF is set. That's how it is defined (and implementation conform to that).
As you can see, all flags are set as defined, the sub
doesn't even know, if the first argument is -14
or +242
, as that doesn't change anything on the bit level, the instruction will just subtract one bit pattern from the other and set up all flags as defined, done. What the bit patterns did represent, and how the flag results will be interpreted, that's up to the following instructions (logic of code), but not a concern to the sub
itself.
It's still possible the subtraction is implemented by addition inside the CPU (although very unlikely, it's not difficult to implement subtraction), with some more extra flag handling to fix the flags, but that depends on the particular chip implementation.
Mind you, the modern x86 is quite a complex beast, translating classic x86 instructions into micro-code operations first, reordering them to avoid stalls (like waiting for value from memory chip) when possible, executing sometimes several micro-operations in parallel (up to 3 operations at one time IIRC), and using hundred+ physical registers which are dynamically renamed/mapped to originals (like al, bl
in your code), i.e. if you would copy those 3 lines of asm twice under itself, the modern x86 CPU will actually execute it probably quite in parallel with two different physical "al" registers and then the next code asking for result in "al" will get that value from the later one, the first one is obviously discarded by the second sub
. But all of these are defined+created to make the observable result "as if the classic 8086 did sequentially run each instruction separately over real single physical AL register", at least in single-core sense (in multi-core/thread setup there are additional instructions to allow programmer to serialize/finalize the results at certain point of code, then the other core/thread may check them to see them in consistent way).
So as long as you are just learning x86 assembly basics, you don't really need to even know that there's some microarchitecture inside modern x86 CPU translating your machine code to different one (which is not directly available to programmers, so there's no "modern x86 micro-assembly" where you can write those micro-ops directly, you can only produce the regular x86 machine code and let the CPU handle that internal implementation itself.
Upvotes: 4