Reputation: 3427
I have a 455 byte array that contains 13, 35 byte data structures.
TurnTreeBuff: resb 455 ; TurnNum (1 byte) + MT (16 bytes) + PM (16 bytes)
; + BoardState (2 bytes) = 35 bytes * 13 turns = 455 bytes
Initially I thought I could access a particular data structure by taking an index, multiplying by 35 and adding it to TurnTreeBuff. But the only valid scale factors are 1, 2, 4 and 8.
So this:
mov word [TurnTreeBuff+ebx*35],ax ; doesn't work
I can do it by copying the index into another register, then shifting the original index value left 5 times and adding it's copy to it's shifted value 3 times. But this seems really cumbersome. It also seems really cumbersome to use the MUL command. Am I better off just padding my structure to be 64 bytes, so it's a multiple of 2? That seems really wasteful as well.
Is there a better way?
(Using NASM 2.10 to assemble a 32-bit binary)
Upvotes: 2
Views: 3798
Reputation: 20725
That * 35
would indeed not compile:
mov word [TurnTreeBuff+ebx*35],ax ; doesn't work
The processor only supports power of two and only the first few: x1, x2, x4, and x8. So if your structure is larger, you have to revert to using a multiplication.
Note that mul/imul are very fast (i.e. as fast as an add or sub) so you shouldn't worry about using such, although if a simple shift would work with your structure (i.e. 64 bytes as you mentioned) then using a shift is probably a lot better. (the mul/imul take longer if you use the result on the very next line.)
Finally, a mov in 32 bit process from an odd address is not a good idea. So the size of your structure should at least be a multiple of 4 bytes, so 36 bytes in your case.
P.S. you could also use both features: so your index (ebx) could be set to 0, 9, 18, etc. and then use the x4 in the instruction. However, using a multiplicator in the address field slows things down a bit... Yet, if you like to have fun, you can do what Jester proposed which is to multiply a bare index (0, 1, 2, etc.) and use lea ecx, [ebx * 8 + ebx]
to multiply by 9 and then use that with x4 in the other address. The big problem with such cool things is: if your structure changes its size... you have to rewrite a lot of code.
Now, what I would generally do, assuming you are looping over your array of structures, is add the size of the structure to the index. For example:
mov ebx, TurnTreeBuff ; get address of first structure
.L1:
...
mov al, [ebx+0] ; TurnNum
mov eax, [ebx+1] ; MT 1st work (should be aligned...)
mov eax, [ebx+5] ; MT 2nd work
..
mov ax, [ebx+33] ; BoardState
...
add ebx, 35 ; again, use a multiple of your architecture: 16, 32, 64 bits...
loop .L1
Now the mov
instructions are very efficient because you do not have a complicated address mode which slows down things (assuming you do millions of accesses, it will show!)
Note that your structure should be reorganized to get things aligned:
TurnNum (1 byte)
PAD (1 byte)
BoardState (2 bytes)
MT (16 bytes)
PM (16 bytes)
Otherwise you hit the memory at unaligned positions all the time and that definitively slows things down.
P.S. The x2, x4, and x8 were primarily added to processors so one could access arrays of pointers, with the added benefit that you could access structures of such sizes. It uses just 2 bits in the instruction, hence the limited range: 1 << n where n is 0, 1, 2, 3.
Upvotes: 3