MistFTW
MistFTW

Reputation: 79

Assembly: Creating an array of linked nodes

Firstly, if this question is inappropriate because I am not providing any code or not doing any thinking on my own, I apologize, and I will delete this question.

For an assignment, we are required to create an array of nodes to simulate a linked list. Each node has an integer value and a pointer to the next node in the list. Here is my .DATA section

.DATA
linked_list DWORD 5 DUP (?) ;We are allowed to assume the linked list will have 5 items
linked_node STRUCT
    value BYTE  ?
    next BYTE ?
linked_node ENDS

I am unsure if I am defining my STRUCT correctly, as I am unsure of what the type of next should be. Also, I am confused as how to approach this problem. To insert a node into linked_list I should be able to write mov [esi+TYPE linked_list*ecx], correct? Of course, I'd need to inc ecx every time. What I'm confused about is how to do mov linked_node.next, "pointer to next node" Is there some sort of operator that would allow me to set the pointer to the next index in the array equal to a linked_node.next ? Or am I thinking about this incorrectly? Any help would be appreciated!

Upvotes: 0

Views: 1308

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364821

Think about your design in terms of a language you are familiar with. Preferably C, because pointers and values in C are concepts that map directly to asm.

Let's say you want to keep track of your linked list by storing a pointer to the head element.

#include <stdint.h>  // for int8_t

struct node {
    int8_t next;  // array index.  More commonly, you'd use  struct node *next;
                  // negative values for .next are a sentinel, like a NULL pointer, marking the end of the list
    int8_t val;
};

struct node storage[5];  // .next field indexes into this array
uint8_t free_position = 0;  // when you need a new node, take index = free_position++;

int8_t head = -1;  // start with an empty list

There are tricks to reduce corner cases, like having the list head be a full node, rather than just a reference (pointer or index). You can treat it as a first element, instead of having to check for the empty-list case everywhere.

Anyway, given a node reference int8_t p (where p is the standard variable name for a pointer to a list node, in linked list code), the next node is storage[p.next]. The next node's val is storage[p.next].val.

Let's see what this looks like in asm. The NASM manual talks about how it's macro system can help you make code using global structs more readable, but I haven't done any macro stuff for this. You might define macros for NEXT and VAL or something, with 0 and 1, so you can say [storage + rdx*2 + NEXT]. Or even a macro that takes an argument, so you could say [NEXT(rdx*2)]. If you're not careful, you could end up with code that's more confusing to read, though.

section .bss
storage: resw 5   ;; reserve 5 words of zero-initialized space
free_position: db 0   ;; uint8_t free_position = 0;


section .data
head: db -1       ;; int8_t head = -1;

section .text


; p is stored in rdx.  It's an integer index into storage
;  We'll access  storage  directly, without loading it into a register.
;  (normally you'd have it in a reg, since it would be space you got from malloc/realloc)

     ; lea rsi, [rel storage]   ;; If you want RIP-relative addressing. 
     ;; There is no [RIP+offset + scale*index] addressing mode, because global arrays are for tiny / toy programs.

    test    edx, edx
    js  .err_empty_list       ;; check for p=empty list (sign-bit means negative)

    movsx   eax, byte [storage + 2*rdx]   ;; load p.next into eax, with sign-extension
    test    eax, eax
    js  .err_empty_list       ;; check that there is a next element

    movsx   eax, byte [storage + 2*rax + 1]  ;;  load storage[p.next].val, sign extended into eax
        ;; The final +1 in the effective address is because the val byte is 2nd.
        ;; you could have used a 3rd register if you wanted to keep p.next around for future use

    ret  ;; or not, if this is just the middle of some larger function


.err_empty_list:   ; .symbol is a local symbol, doesn't have to be unique for the whole file
    ud2   ; TODO: report an error instead of running an invalid insns

Notice that we get away with shorter instruction encoding by sign-extending into a 32bit reg, not to the full 64bit rax. If the value is negative, we aren't going to use rax as part of an address. We're just using movsx as a way to zero-out the rest of the register, because mov al, [storage + 2*rdx] would leave the upper 56 bits of rax with the old contents.

Another way to do this would be to movzx eax, byte [...] / test al, al, because the 8-bit test is just as fast to encode and execute as a 32bit test instruction. Also, movzx as a load has one cycle lower latency than movsx, on AMD Bulldozer-family CPUs (although they both still take an integer execution unit, unlike Intel where movsx/zx is handled entirely by a load port).

Either way, movsx or movzx is a good way to load 8-bit data, because you avoid problems with reading the full reg after writing a partial reg, and/or a false-dependency (on the previous contents of the upper bits of the reg, even if you know you already zeroed it, the CPU hardware still has to track it). Except if you know you're not optimizing for Intel pre-Haswell, then you don't have to worry about partial-register writes. Haswell does dual-bookkeeping or something to avoid extra uops to merge the partial value with the old full value when reading. AMD CPUs, P4, and Silvermont don't track partial-regs separately from the full-reg, so all you have to worry about is the false dependency.


Also note that you can load the next and val packed together, like

.search_loop:
    movzx    eax,  word [storage + rdx*2]   ; next in al,  val in ah
    test     ah, ah
    jz   .found_a_zero_val
    movzx    edx, al        ; use .next for the next iteration
    test     al, al
    jns   .search_loop

    ;; if we get here, we didn't find a zero val
    ret

.found_a_zero_val:
    ;; do something with the element referred to by `rdx`

Notice how we have to use movzx anyway, because all the registers in an effective address have to be the same size. (So word [storage + al*2] doesn't work.)

This is probably more useful going the other way, to store both fields of a node with a single store, like mov [storage + rdx*2], ax or something, after getting next into al, and val into ah, probably from separate sources. (This is a case where you might want to use a regular byte load, instead of a movzx, if you don't already have it in another register). This isn't a big deal: don't make your code hard to read or more complex just to avoid doing two byte-stores. At least, not until you find out that store-port uops are the bottleneck in some loop.


Using an index into an array, instead of a pointer, can save a lot of space, esp. on 64bit systems where pointers take 8 bytes. If you don't need to free individual nodes (i.e. data structure only ever grows, or is deleted all at once when it is deleted), then an allocator for new nodes is trivial: just keep sticking them at the end of the array, and realloc(3). Or use a c++ std::vector.


With those building blocks, you should be all set to implement the usual linked list algos. Just store bytes with mov [storage + rdx*2], al or whatever.

If you need ideas on how to implement linked lists with clean algos that handle all the special-cases with as few branches as possible, have a look at this Codereview question. It's for Java, but my answer is very C-style. The other answers have some nice tricks, too, some of which I borrowed for my answer. (e.g. using a dummy node avoids branching to handle the insertion-as-a-new-head special case).

Upvotes: 1

Related Questions