Stephen Burns
Stephen Burns

Reputation: 172

ARM assembly dereferencing string only retrieving 4 bytes

I have the following code in my ARM assembly program

.data

.balign 4
prompt1: .asciz "Enter a string: "

.balign 4
scan1: .asciz "%s"

.balign 4
string_read: .word 0

.text

.global main

main:
  push {fp, lr}
  ldr r0, addr_prompt1
  bl printf

  ldr r0, addr_scan1
  ldr r1, addr_string_arg
  bl scanf

  ldr r2, addr_string_arg
  ldr r2, [r2]



addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read

I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform

ldr r2, addr_string_arg

it is holding an address that is pointing to the full string "test_cases". However, after I dereference

ldr r2, [r2]

r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.

Upvotes: 2

Views: 1864

Answers (1)

cooperised
cooperised

Reputation: 2599

All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!

Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.

You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction

ldr r2, addr_string_arg

you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write

ldr r2, [r2]

you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.

Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.

As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,

ldr r2, =string_read

removes the need for your addr_string_arg.

Upvotes: 5

Related Questions