Reputation: 121
i'm learning assembly language on x86 and meet a problem, which follow is faster and why?
ADD AX, 100
ADD AX, BX
The answer in book is the second, but I think that the second one need to read a register first where the first one can add directly. So can anyone please tell me the answer?
Upvotes: 0
Views: 162
Reputation: 28921
Go back far enough to 8086/8088, and lea ax,100[ax] was faster than add ax,100 . I'm not sure about 80286.
Upvotes: 0
Reputation: 95410
The answer will depend on the actual implementation of the CPU, which depends on when it was designed. Older CPUs in will have different timings than newer ones.
With modern CPUs, in general these will be the same speed, because the CPU designers have thrown a huge amount of resources at making basic instructions fast in common cases.
Even so, one can construct circumstances in which the ADD AX,BX will be faster (last instruction fully within the cache line, with the next cache line not yet arrived from memory even with prefetch) and some in which the ADD AX, 100 will be faster (BX is fed by some earlier instruction which takes a long time to complete).
For this particular pair of instructions, I wouldn't spend much time worrying about it. Best you write your code using what you think are reasonable choices (float-add is almost always slower than integer-add because it is much more complex). [Once you have written a fair amount of assembly code this is pretty easy]. After you have running code, measure performance and optimize where necessary. Usually the place that needs optimization is a surprise.
Upvotes: 1
Reputation: 11582
In modern processors there is no difference in performance. If you change the immediate from 100 to 128 (or larger) then there might be a significant difference. I know that sounds strange.
There are several manufacturers of x86 processors (Intel, AMD, Via), and each has many generations of processor designs (micro-architectures). Your question cannot be answered in general because the answer depends on the micro-architecture. For Intel, a good resource for this sort of question is the
Intel® 64 and IA-32 Architectures Optimization Reference Manual
Modern high performance CPUs are complex machines. For most code you shouldn't have to worry about this level of detail, you write in a high level language, use an optimizing compiler, and be happy. When the performance of your code is vital you might have to be concerned with these details. If that's the case, then you need to understand the specific micro-architecture you are targeting, which mode the processor is in, and perhaps the actual value of the immediate (surprise!). Relevant to your question is whether the processor is in
The instruction in your question ADD AX,100
is adding a 16-bit immediate (which can be encoded as a signed 8 bit immediate) to a 16-bit register. That can be done with a different opcode than if you use a signed immediate that doesn't fit in 8 bits. I used the following website to assemble these instructions:
https://defuse.ca/online-x86-assembler.htm#disassembly
Notice that encoding an ADD
of an 8-bit signed immediate to AX
can be done using a different opcode than encoding and ADD
with a 16-bit signed immediate.
0: 83 c0 64 add ax,100
3: 05 80 00 add ax,128
You might be wondering, so what? it is the same number of bytes... but there's more to it than that. In 32-bit mode, some instruction encodings that in Real Mode were interpreted as a a 16-bit ADD
are now interpreted as a 32-bit ADD
. In order to encode a 16-bit add in 32-bit mode x86 requires an operand size override prefix byte
, 0x66. The encoding of 8-bit ADD
remains the same:
0: 66 83 c0 64 add ax,100
4: 66 05 80 00 add ax,128
8: 83 c0 64 add eax,100
b: 05 80 00 00 00 add eax,128
Here's the important thing, notice that the 0x05 opcode is followed by either two bytes (when the 0x66 prefix is present) or four bytes (the default, when 0x66 is not present). This wreaks havoc with the instruction predecoder which is trying to decode many instructions at once and since x86 instructions can be anywhere from 1 to 15 bytes it makes assumptions about default sizes based on opcodes. The 0x66 prefix on instructions that have 16-bit immediate changes the overall length of the instruction... this is known as a length changing prefixes (LCP)
and can introduce a three to six cycle stall in the decoder, depending on micro-architecture, which can be significant.
Search for the following rules in Intel's optimization manual for more information
Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favor generating code using imm8 or imm32 values instead of imm16 values.
and
Assembly/Compiler Coding Rule 27. (M impact, MH generality) Avoid using prefixes to change the size of immediate and displacement.
Upvotes: 1
Reputation: 22478
In older 80X86 CPUs, immediate values to operands needed to be read from memory, while register operands were encoded in the instruction itself, which already was 'read'. So
add ax, bx
was a single instruction; after reading it, everything needed was "inside" the CPU and could be processed immediately.
The instruction
add ax, 100
was parsed as add ax, ?
and so the CPU needed to read the next word from memory before it could continue.
This is no longer true for new CPUs, but the book the OP refers to (its title and publication date are not mentioned) may very well be old enough.
Upvotes: 0
Reputation: 19736
It depends on the context (the rest of the program).
The second instruction introduces a data dependency, if you just had to load BX from main memory, you may have to stall for a long time. On the other hand, the first instruction increases the data footprint, and therefore needs more space in the instruction cache to encode the immediate value, which may be critical if it's just enough to cause a few extra misses in some performance-critical loop.
On top of that, there are CPUs today that can perform register copies without executing anything (just using register renaming), so it also depends on the exact micro-architecture you use.
My advice is - find another book, one that does not presume to tell you what will always happen. Also, using AX and BX implies it's rather old...
Upvotes: 1