Reputation: 13850
I've had always been curious about the cost of jumps in assembly.
cmp ecx, edx
je SOME_LOCATION # What's the cost of this jump?
Does it need to do a search in a lookup table for each jumps or how does it work?
Upvotes: 4
Views: 2765
Reputation: 12434
No, a jump doesn’t do a search. The assembler resolves the label to an address, which in most cases is then converted to an offset from the current instruction. The address or offset is encoded in the instruction. At run time, the processor loads the address into the IP register or adds the offset to the current value of the IP register (along with all the other effects discussed by @Brendan).
There is a type of jump instruction that can be used to get the destination from a table. The jump instruction reads the address from a memory location. (The instruction specifies a single location, so there still is no “search”.) This instruction could look something like this:
jmp table[eax*4]
where eax is the index of the entry in the table containing the address to jump to.
Upvotes: 4
Reputation: 37232
Originally (e.g. 8086) the cost of a jump wasn't much different to the cost of a mov
.
Later CPUs added caches, which meant some jumps were faster (because the code they jump to is in the cache) and some jumps were slower (because the code they jump to isn't in the cache).
Even later CPUs added "out of order" execution, where conditional branches (e.g. je SOME_LOCATION
) would have to wait until the flags from "previous instructions that happen to be executed in parallel" became known.
This means that a sequence like
mov esi, edi
cmp ecx, edx
je SOME_LOCATION
can be slower than rearranging it to
cmp ecx, edx
mov esi, edi
je SOME_LOCATION
to increase the chance that the flags would be known.
Even later CPUs added speculative execution. In this case, for conditional branches the CPU just takes a guess at where it will branch to before it actually knows (e.g. before the flags are known), and if it guesses wrong it'll just pretend that it didn't execute the wrong instructions. More specifically, the speculatively executed instructions are tagged at the start of the pipeline and held at the end of the pipeline (at retirement) until the CPU knows if they can be committed to visible state or if they have to be discarded.
After that things just got more complicated, with fancier methods of doing branch prediction, additional "branch target" buffers, etc.
Far jumps that change the code segment are more expensive. In real mode it's not so bad because the CPU mostly only does "CS.base = value * 16" when CS
is changed. For protected mode it's a table lookup (to find GDT or LDT entry), decoding the entry, deciding what to do based on what kind of entry it is, then a pile of protection checks. For long mode it's vaguely similar. All of this adds more uncertainty (e.g. with the table entry be in cache?).
On top of all of this there's things like TLB misses. For example, jmp [indirectAddress]
can cause a TLB miss at indirectAddress
then a TLB miss at the stack top then a TLB miss at the new instruction pointer; where each TLB miss can cost a few hundred cycles.
Mostly; the cost of a jump can be anything from 0 cycles (for a correctly predicted jump) to maybe 1000 cycles; depending on which CPU it is, what kind of jump, what is in caches, what branch prediction predicts, cache/TLB misses, how fast/slow RAM is, and anything I may have forgotten.
Upvotes: 7