Reputation: 217
in Assembly, if i have a JUMP table with the address of over 2000 labels:
.TABLE:
DD .case0
DD .case1
DD .case2
DD .case3
DD .case4
...
...
...
DD .case2000
which way is better for addressing to jump:
way 1:
mov r12d, .TABLE ; r12d or any other registers
mov ebx, [r13d] ; r13d holds the id of case * 4 so we don't need to '4 * ebx'
add ebx, r12d ; ebx = address for Jumping
jmp ebx
way 2: (Same way 1 but 'add ebx, r12d'
is removed and changed to 'jmp [ebx+r12d]'
)
mov r12d, .TABLE ; r12d or any other registers
mov ebx, [r13d] ; r13d holds the id of case * 4 so we don't need to '4 * ebx'
jmp [ebx+r12d]
way 3:
mov ebx, [r13d] ; r13d holds the id of case * 4 so we don't need to '4 * ebx'
jmp [ebx + .TABLE]
in the 'way 1', we have source code size problem due to extra functions but i think it has better performance than other ways in jumping because im going to have about 2000 jumps (Irregular jump (May be from case0 to case1000 or ...)
So for jumping performance, which way is better in a source code that has a lot of JUMP ?
Upvotes: 0
Views: 98
Reputation: 364160
Using 32-bit address size is a good choice if you can get away with it to compress the jump table vs. using qword pointers for 64-bit mode.
Otherwise you'd want to load 16-bit or 32-bit offsets (movzx
or mov
) and add to some 64-bit base address from a RIP-relative LEA for 64-bit code. (Which also makes it position-independent).
fewest instructions is not always a solution !
But in this case fewest instructions is also fewest uops. [disp32 + reg]
addressing modes are efficient.
If you were going to consider using more instructions, it would be to load the pointer into a register for jmp reg
instead of using jmp [mem]
, not simplifying addressing modes even more.
https://agner.org/optimize/ shows that jmp mem
on Intel Sandybridge family is still only 1 fused-domain uop, with the load micro-fused into the port 6 jump uop.
So a separate mov
load would actually cost more uops in the front-end.
(An indexed addressing mode would probably unlaminate; jmp [.TABLE + ebx*4]
would cost 2 uops for the issue/rename stage but still only 1 in the decoders and uop cache. But it seems you have a byte offset stored in memory for some reason, so you don't need a scaled index.)
Upvotes: 1