Reputation: 172
I have the following code in my ARM assembly program
.data
.balign 4
prompt1: .asciz "Enter a string: "
.balign 4
scan1: .asciz "%s"
.balign 4
string_read: .word 0
.text
.global main
main:
push {fp, lr}
ldr r0, addr_prompt1
bl printf
ldr r0, addr_scan1
ldr r1, addr_string_arg
bl scanf
ldr r2, addr_string_arg
ldr r2, [r2]
addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read
I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform
ldr r2, addr_string_arg
it is holding an address that is pointing to the full string "test_cases". However, after I dereference
ldr r2, [r2]
r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.
Upvotes: 2
Views: 1864
Reputation: 2599
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt
to pass as the first argument to printf
, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2
, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2
. r2
doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read
is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR
pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD
, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg
.
Upvotes: 5