Reputation: 1660
I'm trying to learn some assembly, specifically ARM64.
I'm trying to initialize an array of 16-bit integers to some fixed value (123).
Here's what I have:
.global _main
.align 2
_main:
mov x0, #0 ; start with a 0-byte offset
mov x1, #123 ; the value to set each 16-bit element to
lsl x1, x1, #48 ; shift this value to the upper 16-bits of the register
loop:
str x1, [sp, x0] ; store the full 64-bit register at some byte offset
add x0, x0, #2 ; advance by two bytes (16-bits)
cmp x0, #10 ; loop until we've written five 16-bit integers
b.ne loop
ldr x2, [sp] ; load the first four 16-bit integers into x2
ubfm x0, x2, #48, #63 ; set the exit status to the leftmost 16-bit integer
mov x16, #1 ; system exit
svc 0 ; supervisor call
I'm expecting the exit status to be 123 but it's 0. I don't understand why.
If I comment out the last two lines of the loop the exit status is 123 which is correct.
Please could someone explain what's going on? Is it an alignment problem?
Thanks
Upvotes: 1
Views: 1033
Reputation: 364160
The "fun" way:
AArch64 can put a repeating pattern (of any power-of-2 length) into a 64-bit register efficiently, if all the set bits are contiguous inside each repeat. (That's how it encoded immediates for bitwise boolean instructions like orr x1, xzr, #0x0303030303030303
= mov x1, #...
). This is almost true for your 123
= 0x7b
= 0b1111011
.
Alternatively, ldr x1, =0x007B007B007B007B
will ask the assembler to do it for you; in this case GAS chooses to put the constant in memory nearby and load it with a PC-relative addressing mode.
You can reserve space for your array at the same time as you store it to the stack by using a store with a write-back addressing mode that updates the base register (sp
in this case) with the offset you subtract. This is how AArch64 implements stack "push" operations efficiently. e.g. in a function that needs to save some registers, GCC uses stp x29, x30, [sp, -32]!
on function entry to subtract 32 bytes from SP as well as STore that Pair of registers at the bottom of that space. (Godbolt example)
So I think this should work. This does assemble but I haven't tried running it. AArch64's standard calling convention maintains 16-byte stack alignment so this store-pair 16-byte store is aligned.
mov x0, #0x7f007f007f007f
and x0, x0, #~0x0004000400040004 // construct 0x007B repeating
stp x0, x0, [sp, -16]! // push x0 twice
// SP now points at 8 copies of (uint16_t)123, below whatever was on the stack before
Loops with strh
(store 16-bit half-word) are for boring compilers; when hand-writing try to get as much done with as few instructions as possible. (That's a general rule of thumb, not always correlated with performance! e.g. a wide load that only partially overlaps a previous store may cause a store-forwarding stall if the store was very recent.
Upvotes: 2
Reputation: 5895
Assuming you are running your program on a little-endian Aaarch64 system
, on a given loop iteration, you are overriding the bytes you had modified in the previous one:
You are actually writing the bytes:
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7b
from: sp + x0 + 0
to: sp + x0 + 7
at each iteration.
Initial conditions:
(gdb) p/x $sp
$3 = 0x40010000
(gdb) x/12xh 0x40010000
0x40010000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x40010010: 0x0000 0x0000 0x0000 0x0000
# initializing some memory to 0xff so that we may see what is going on
(gdb) set {unsigned long }(0x40010000) = 0xffffffffffffffff
(gdb) set {unsigned long }(0x40010008) = 0xffffffffffffffff
(gdb) set {unsigned long }(0x40010010) = 0xffffffffffffffff
(gdb) x/12xh 0x40010000
0x40010000: 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff
0x40010010: 0xffff 0xffff 0xffff 0xffff
Before first loop:
(gdb) p/x$x1
$4 = 0x7b000000000000
Looping:
# pass #1, after str x1, [sp, x0]
(gdb) x/12xh 0x40010000
0x40010000: 0x0000 0x0000 0x0000 0x007b 0xffff 0xffff 0xffff 0xffff
0x40010010: 0xffff 0xffff 0xffff 0xffff
# pass #2, after str x1, [sp, x0]
(gdb) x/12xh 0x40010000
0x40010000: 0x0000 0x0000 0x0000 0x0000 0x007b 0xffff 0xffff 0xffff
0x40010010: 0xffff 0xffff 0xffff 0xffff
# pass #3, after str x1, [sp, x0]
(gdb) x/12xh 0x40010000
0x40010000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x007b 0xffff 0xffff
0x40010010: 0xffff 0xffff 0xffff 0xffff
# pass #4, after str x1, [sp, x0]
0x40010000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x007b 0xffff
0x40010010: 0xffff 0xffff 0xffff 0xffff
# pass #5, after str x1, [sp, x0]
(gdb) x/12xh 0x40010000
0x40010000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x007b
0x40010010: 0xffff 0xffff 0xffff 0xffff
# after ldr x2, [sp]:
(gdb) p/x $sp
$2 = 0x40010000
(gdb) p/x $x2
$3 = 0x0
(gdb)
Your program would work if you were not shifting the value in x1
, by commenting-out lsl x1, x1, #48
:
# after ldr x2, [sp]
(gdb) x/12xh 0x40010000
0x40010000: 0x007b 0x007b 0x007b 0x007b 0x007b 0x0000 0x0000 0x0000
0x40010010: 0x0000 0x0000 0x0000 0x0000
(gdb) p/x $x2
$1 = 0x7b007b007b007b
(gdb)
This being said, this would probably better to use the strh instruction, so that you would avoid writing more bytes than you should, i.e. 16 instead of 2, at each iteration of your loop.
Bottom line, on a little-endian system, the constant 0x0000000000007b
will be stored in memory (ascending addresses) as 7b 00 00 00 00 00 00 00
, and the constant 0x7b00000000000000
will be stored as 00 00 00 00 00 00 00 7b
.
Because of the shifting you do, you are storing 0x7b00000000000000
, not 0x0000000000007b
, into memory.
Upvotes: 2