Reputation: 121

Which assembly language statement executes faster?

i'm learning assembly language on x86 and meet a problem, which follow is faster and why?

ADD AX, 100 

ADD AX, BX

The answer in book is the second, but I think that the second one need to read a register first where the first one can add directly. So can anyone please tell me the answer?

Upvotes: 0

Answers (5)

rcgldr

Reputation: 28941

Go back far enough to 8086/8088, and lea ax,100[ax] was faster than add ax,100 . I'm not sure about 80286.

Upvotes: 0

Ira Baxter

Reputation: 95420

The answer will depend on the actual implementation of the CPU, which depends on when it was designed. Older CPUs in will have different timings than newer ones.

With modern CPUs, in general these will be the same speed, because the CPU designers have thrown a huge amount of resources at making basic instructions fast in common cases.

Even so, one can construct circumstances in which the ADD AX,BX will be faster (last instruction fully within the cache line, with the next cache line not yet arrived from memory even with prefetch) and some in which the ADD AX, 100 will be faster (BX is fed by some earlier instruction which takes a long time to complete).

For this particular pair of instructions, I wouldn't spend much time worrying about it. Best you write your code using what you think are reasonable choices (float-add is almost always slower than integer-add because it is much more complex). [Once you have written a fair amount of assembly code this is pretty easy]. After you have running code, measure performance and optimize where necessary. Usually the place that needs optimization is a surprise.

Upvotes: 1

amdn

Reputation: 11582

In modern processors there is no difference in performance. If you change the immediate from 100 to 128 (or larger) then there might be a significant difference. I know that sounds strange.

There are several manufacturers of x86 processors (Intel, AMD, Via), and each has many generations of processor designs (micro-architectures). Your question cannot be answered in general because the answer depends on the micro-architecture. For Intel, a good resource for this sort of question is the

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Modern high performance CPUs are complex machines. For most code you shouldn't have to worry about this level of detail, you write in a high level language, use an optimizing compiler, and be happy. When the performance of your code is vital you might have to be concerned with these details. If that's the case, then you need to understand the specific micro-architecture you are targeting, which mode the processor is in, and perhaps the actual value of the immediate (surprise!). Relevant to your question is whether the processor is in

Real Mode (16-bit)
32-bit mode, or x86-64 long mode

The instruction in your question ADD AX,100 is adding a 16-bit immediate (which can be encoded as a signed 8 bit immediate) to a 16-bit register. That can be done with a different opcode than if you use a signed immediate that doesn't fit in 8 bits. I used the following website to assemble these instructions:

https://defuse.ca/online-x86-assembler.htm#disassembly

Notice that encoding an ADD of an 8-bit signed immediate to AX can be done using a different opcode than encoding and ADD with a 16-bit signed immediate.

16-bit (Real Mode, Virtual 8086 Mode)

0:  83 c0 64             add    ax,100
3:  05 80 00             add    ax,128

You might be wondering, so what? it is the same number of bytes... but there's more to it than that. In 32-bit mode, some instruction encodings that in Real Mode were interpreted as a a 16-bit ADD are now interpreted as a 32-bit ADD. In order to encode a 16-bit add in 32-bit mode x86 requires an operand size override prefix byte, 0x66. The encoding of 8-bit ADD remains the same:

32-bit or x86-64 (long mode)

0:  66 83 c0 64             add    ax,100
4:  66 05 80 00             add    ax,128
8:     83 c0 64             add    eax,100
b:     05 80 00 00 00       add    eax,128

Here's the important thing, notice that the 0x05 opcode is followed by either two bytes (when the 0x66 prefix is present) or four bytes (the default, when 0x66 is not present). This wreaks havoc with the instruction predecoder which is trying to decode many instructions at once and since x86 instructions can be anywhere from 1 to 15 bytes it makes assumptions about default sizes based on opcodes. The 0x66 prefix on instructions that have 16-bit immediate changes the overall length of the instruction... this is known as a length changing prefixes (LCP) and can introduce a three to six cycle stall in the decoder, depending on micro-architecture, which can be significant.

Search for the following rules in Intel's optimization manual for more information

Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favor generating code using imm8 or imm32 values instead of imm16 values.

and

Assembly/Compiler Coding Rule 27. (M impact, MH generality) Avoid using prefixes to change the size of immediate and displacement.

Upvotes: 1

Jongware

Reputation: 22478

In older 80X86 CPUs, immediate values to operands needed to be read from memory, while register operands were encoded in the instruction itself, which already was 'read'. So

add ax, bx

was a single instruction; after reading it, everything needed was "inside" the CPU and could be processed immediately.

The instruction

add ax, 100

was parsed as add ax, ? and so the CPU needed to read the next word from memory before it could continue.

This is no longer true for new CPUs, but the book the OP refers to (its title and publication date are not mentioned) may very well be old enough.

Upvotes: 0

Leeor

Reputation: 19746

It depends on the context (the rest of the program).

The second instruction introduces a data dependency, if you just had to load BX from main memory, you may have to stall for a long time. On the other hand, the first instruction increases the data footprint, and therefore needs more space in the instruction cache to encode the immediate value, which may be critical if it's just enough to cause a few extra misses in some performance-critical loop.

On top of that, there are CPUs today that can perform register copies without executing anything (just using register renaming), so it also depends on the exact micro-architecture you use.

My advice is - find another book, one that does not presume to tell you what will always happen. Also, using AX and BX implies it's rather old...

Upvotes: 1

Which assembly language statement executes faster?

Answers (5)

16-bit (Real Mode, Virtual 8086 Mode)

32-bit or x86-64 (long mode)

Related Questions