How to compare addresses in x86, using the end pointer of a static array as a loop condition?

Question

One of the challenge questions in programming from the ground up is "to modify the program to use an ending address rather than the number 0 to know when to stop."

I am finding it difficult to do this since up to this point the book has only introduced movl, cmpl, incl (along with the addressing modes) and jmp instructions. Basically everything in the code snippet below is what has been introduced so far. All solutions I have found involve instructions not yet introduced in this book. The code below finds the maximum value from the set.

.section .data
data_items:             #These are the data items
.long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

.section .text
.globl _start
_start:
    movl $0, %edi                   # move 0 into the index register
    movl data_items(,%edi,4), %eax  # load the first byte of data
    movl %eax, %ebx                 # since this is the first item, %eax is
                                    # the biggest
start_loop:                     # start loop
    cmpl $0, %eax                   # check to see if we’ve hit the end
    je loop_exit
    incl %edi                       # load next value
    movl data_items(,%edi,4), %eax
    cmpl %ebx, %eax                 # compare values
    jle start_loop                  # jump to loop beginning if the new
                                    # one isn’t bigger
    movl %eax, %ebx                 # move the value as the largest
    jmp start_loop                  # jump to loop beginning
loop_exit:
    # %ebx is the status code for the exit system call
    # and it already has the maximum number
    movl $1, %eax   #1 is the exit() syscall
    int $0x80

Note this question is distinctly different from the subsequent question which asks to modify the program to use a length count rather than the number 0. To me it seems like the address of the last number in the array should be stored in a register and then compared to the address of the pointer. I can't figure out a way to do this that fits the progression of this book since the book has only introduced the bare bones thus far.

Peter Cordes · Accepted Answer

You can do this with only mov and cmp, no lea required to calculate an end-pointer. (You don't have a length anywhere anyway to use with LEA).

You should add a new label at the end of the array, so you can refer to that position in memory (aka address). And remove the terminating 0 from the array because we're using addresses instead of a sentinel value.

.section .data
data_items:
  .long 3,67,34,222,45,75,54,34,44,33,22,11,66     # ,0   remove the sentinel / terminator
data_items_end:                                  # and add this new label

You don't need that address in a register; you can use cmp $data_items_end, %reg to use it as an immediate, with the linker filling in the right bytes into the machine code just like it does for your mov data_items(,%edi,4), %eax. (cmp symbol, %reg would compare with memory at that address. $symbol is the address as an immediate, in AT&T syntax.)

What you do need in a register is the start address, so you can increment and deref it. (For a function takes a pointer+length, you could compute the end address in a register.)

_start:
    mov  $data_items, %edi       # int *ptr = &data_items[0]
    mov  (%edi), %ebx            # current max
   # setting %eax is unnecessary here, it's always written before being read in this and the original version
loop_start:
    add  $4, %edi                # ptr++  (4 byte elements)
    cmp  $data_items_end, %edi
    je   loop_exit               # if (ptr == endp) break
    ...                  # compare with (%edi) and update %ebx if greater.
    jmp  loop_start
  ...

More efficient would be a do{}while loop structure like compilers use, especially since you know the array contains more than 1 element so you don't need to check for the case where the loop body should run 0 times. Notice that there's no unconditional jmp that has to execute every time, in addition to a cmp/jcc.

_start:
    mov  $data_items, %edi       # int *ptr = &data_items[0]
    mov  (%edi), %ebx            # current max

loop_start:                    # do{
    add  $4, %edi                # ptr++;  (4 byte elements)
  ## maybe update max:
    mov  (%edi), %eax            # tmp = *ptr;
    cmp  %ebx, %eax
    cmovg %eax, %ebx             # max = (tmp > max) ? tmp : max;
  ## end of loop body

    cmp  $data_items_end, %edi
    jne  loop_start            # }while(ptr != endp)
## end of loop, but nothing jumps here so no label is needed.

    mov  $1, %eax
    int  $0x80             # SYS_exit(%ebx)

I used cmp/cmovg (conditional move) instead of branching just because it's fewer instructions to type and no branching within the loop, making it easier to see the loop structure.

Other examples of looping and pointers:

Assembly Language (x86): How to create a loop to calculate Fibonacci sequence - functions that take a pointer+length as args, and use LEA to calculate an end pointer. (x86-64 NASM syntax)
How to check an "array's length" in Assembly Language (ASM), - defining an assemble-time constant based on the length of a .long static array, instead of putting a label at the end.
Copying to arrays in NASM - some tricks for writing efficient loops that loop over two arrays, e.g. indexing one relative to the other to still only use one increment but avoid indexed addressing modes. Or counting a negative index up towards zero, so you can still loop forwards in memory but still not need a separate cmp instruction, just inc / jnz.

How to compare addresses in x86, using the end pointer of a static array as a loop condition?

Answers (1)

Related Questions