Reputation: 613
Why would you want to use:
MOV EAX, 22
SHL EAX, 2
...when multiplying by 4 opposed to just using the MUL
instruction?
I understand that this can also be done with SHR
instead of DIV
as well.
What are the advantages of doing this?
Also can you do this with odd numbers or can it only be even numbers?
Upvotes: 3
Views: 3343
Reputation: 170
Using the SHL
/SHR
instruction is, generally speaking, much faster than MUL
/DIV
.
To answer your second question, you can do this with odd numbers as well, but you do have to add another instruction. So you can't technically just do it using the SHL
/SHR
.
For example: the following code multiplies by 5 without using the MUL
instruction:
mov num, 5
mov eax, num
mov ebx, num
shl eax, 2 ; MULs by 4
add eax, ebx ; ADD the x1 to make = 5
Upvotes: 3
Reputation: 95400
There are a number of code idioms that are faster than "MUL constant".
Modern x86 CPUs execute a MUL in several clocks, minimum. So any code sequence that computes the product in 1-2 clocks will outperform the MUL. You can use fast instructions (ADD, SHL, LEA, NEG) and the fact that the processor may execute some of these instructions in parallel in a single clock to replace MUL. Arguably this means you can perform 4 of these instructions in many combinations in 2 clocks if you avoid some data dependencies.
The LEA instruction is particularly interesting because it can multiply by some small constants (1,2,3,4,5,8,9) as well as move the product to another register, which is one easy way to break data dependencies. That allows you to compute a sub-product without destroying the original operand.
Some examples:
Multiply EAX by 5, move product to ESI:
LEA ESI, [EAX+4*EAX] ; this takes 1 clock
Multiply EAX by 18:
LEA EAX, [EAX + 8*EAX]
SHL EAX, 1
Multiply EAX by 7, move result to EBX:
LEA EBX, [8*EAX]
SUB EBX, EAX
Multiply EAX by 28:
LEA EBX, [8*EAX]
LEA ECX, [EAX+4*EAX] ; this and previous should be executed in parallel
LEA EAX, [EBX+4*ECX]
Multiply by 1020:
LEA ECX, [4*EAX]
SHL EAX, 10 ; this and previous instruction should be executed in parallel
SUB EAX, ECX
Multiply by 35
LEA ECX, [EAX+8*EAX]
NEG EAX ; = -EAX
LEA EAX, [EAX+ECX*4]
So, when you want to achieve the effect of multiplying by a modest size constant, you have to think about how it can be "factored" into various products that the LEA instruction can produce, and how one might shift, add, or subtract a partial result to get the final answer.
It is remarkable how many multiply-by-constants can be produced this way. You might think this is only useful for really small constants but as you can see from the 1020 example above you can get some surprisingly medium size ones, too. This turns out be be really handy when indexing into arrays-of-structs because you have to multiply an index by the size of the struct. Often when indexing an array like this, you want to compute the element address and fetch the value; in this case you can merge a final LEA instruction into a MOV instruction, which you cannot do with a real MUL. This buys you additional clock cycle(s) in which to do the MUL by this type of idiom.
[I have built a compiler that computes the "best multiply by constant" using these instructions by doing a small exhaustive search of instruction combinations; it then caches that answer for later reuse].
Upvotes: 5