C.Edelweiss
C.Edelweiss

Reputation: 161

Issue with strchr() function implementation

I've recently started looking into assembly code and I'm trying to recode some basic system functions to get a grip on it, I'm currently stuck on a segmentation fault at 0x0 on my strchr.

section .text
global strchr

strchr:
    xor rax, rax

loop:
    cmp BYTE [rdi + rax], 0
    jz end

    cmp sil, 0
    jz end

    cmp BYTE [rdi + rax], sil
    jz good

    inc rax
    jmp loop

good:
    mov rax, [rdi + rcx]
    ret

end:
    mov rax, 0
    ret

I can't figure out how to debug it using GDB, also the documentation I've came across is pretty limited or hard to understand.

I'm using the following main in C to test

extern char *strchr(const char *s, int c);

int main () {
   const char str[] = "random.string";
   const char ch = '.';
   char *ret;

   ret = strchr(str, ch);
   printf("%s\n", ret);
   printf("String after |%c| is - |%s|\n", ch, ret);

   return(0);
}

Upvotes: 1

Views: 778

Answers (1)

jfMR
jfMR

Reputation: 24788

The Problem

The instruction immediately following the good label:

mov rax, [rdi + rcx]

should actually be:

lea rax, [rdi + rax]

You weren't using rcx at all, but rax and, what you need is the address of that position, not the value at that position (i.e. lea instead of mov).


Some Advice

  1. Note that the typical idiom for comparing sil against zero is actually test sil, sil instead of cmp sil, 0. It would be then:

    test sil, sil
    jz end
    

    However, if we look at the strchr(3) man page, we can find the following:

    char *strchr(const char *s, int c);

    The terminating null byte is considered part of the string, so that if c is specified as '\0', these functions return a pointer to the terminator.

    So, if we want this strchr() implementation to behave as described in the man page, the following code must be removed:

    cmp sil, 0
    jz end 
    
  2. The typical zeroing idiom for the rax register is neither mov rax, 0 nor xor rax, rax, but rather xor eax, eax, since it doesn't have the encode the immediate zero and saves one byte respect to the latter.


With the correction and the advice above, the code would look like the following:

section .text
global strchr

strchr:
    xor eax, eax

loop:
    ; Is end of string?
    cmp BYTE [rdi + rax], 0
    jz end

    ; Is matched? 
    cmp BYTE [rdi + rax], sil
    jz good

    inc rax
    jmp loop

good:
    lea rax, [rdi + rax]
    ret

end:
    xor eax, eax
    ret

Upvotes: 5

Related Questions