Reputation: 23
.data
.global _start
_start:
mov r7, #4
mov r0, #1
mov r2, #12
ldr r4, =#0x6c6c6548
str r4, [pc, #4]
mov r1, pc
add pc, pc, #8
strbt r6, [ip], -r8, asr #10
svcvs 0x0057206f
beq 0x193b248
swi #0
mov r7, #1
mov r0, #0
swi #0
I stumbled upon this little ARM assembly program, which prints "Hello World". Save it as test.s
to test:
$ as -o test.o test.s
$ ld -o test test.o
$ ./test
Hello World
$
How does this work? I cannot see a single string in the entire program. It also doesn't read the string from anywhere else; it looks like this code is all that's needed to print the string. Where does the string come from?
Upvotes: 2
Views: 1382
Reputation: 20984
Here's an annotation of the interesting bit:
mov r7, #4
mov r0, #1
mov r2, #12
ldr r4, =#0x6c6c6548
A str r4, [pc, #4]
B mov r1, pc
C add pc, pc, #8
D strbt r6, [ip], -r8, asr #10
E svcvs 0x0057206f
F beq 0x193b248
G swi #0
mov r7, #1
mov r0, #0
swi #0
The store at A
is targeting location D
- as pointed out in the comments, that word (in little endian order) creates the 4 ASCII bytes "Hell" - that gets stored over the top of the nonsensical instruction there (the machine code of which is 0xe66c6548 - close, but not good enough). That's presumably also why this is in the data section, to ensure that it is writeable*. Meanwhile, the machine code of the instruction at E
is 0x6f57206f, which makes "o Wo". Instruction F
is particularly tricksy, as that address must result in the relative branch offset, once encoded, looking like "rld"** - the beq
encoding is 0x0annnnnn, where nnnnnn is the top 24 bits of a 26-bit two's complement offset value - note also that the condition code and opcode in the top byte there make up the final newline.
Instruction B
puts the address of D
into r1, i.e. a pointer to the start of the string. r0 and r2 are obviously the other necessary syscall arguments, and r7 is the syscall number itself (I'm too lazy to look it up, but I assume the 1 in r0 is for stdout, the 12 in r2 is the number of characters, and syscall 4 is write
).
Finally, instruction C
is a jump to the syscall at G
, so none of the "instructions" at D
, E
, and F
are actually executed (the rest after that is just making an exit
syscall).
Pretty neat, for trick code.
* and presumably also relying on some backwards-compatibility behaviour in the loader to leave the data section executable.
** which incidentally doesn't happen with my binutils 2.26 linker, probably due to the default section alignment having changed in recent versions.
Upvotes: 2