Reputation: 41
I've been writing my own naive implementation of the strcmp function in x86_64 assembly.
I've written a c test program to compare the behavior with the real strcmp from the C library.
Even if both function seems to return the same value, the condition
(strcmp(s1,s2) == my_strcmp(s1,s2))
seems to be false.
Weirdly, my_strcmp(s1,s2) == strcmp(s1,s2)
seems to be true.
What am I missing ?
While troubleshooting, i've compiled with my implementation of strcmp in C and the same test program, just to verify : it works well, all behaviors are expected.
my_strcmp.s :
; Function: int my_strcmp(const char *dest, const char *src);
; Arguments:
; rdi s1
; rsi s2
; Variable:
; rcx counter
; Returned value :
section .text
global my_strcmp
my_strcmp:
; Function prologue (optional)
push rbp
mov rbp, rsp
xor rcx,rcx
loop:
mov al, byte [rdi]
mov bl, byte [rsi]
cmp al, bl
jne not_equal
test al,al
je null_byte_found
inc rdi
inc rsi
jmp loop
not_equal:
sub al, bl
movsx rax, al
jmp end
null_byte_found:
xor rax,rax
jmp end
end:
; Function epilogue (optional)
mov rsp, rbp
pop rbp
ret
main.c ( the test program) :
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
extern int my_strcmp(const char *s1, const char *s2);
void test_my_strcmp();
int main() {
test_my_strcmp();
return 0;
}
void test_my_strcmp(){
char s1[10] = "123456789";
char s2[10] = "12345AAA";
int a = strcmp(s1,s2);
int b = my_strcmp(s1,s2);
if ( a == b)
printf("a == b\n");
if (strcmp(s1,s2) == -11)
printf("strcmp return value is -11\n");
if (-11 == my_strcmp(s1,s2))
printf("my_strcmp retun value is -11\n");
if (strcmp(s1,s2) == my_strcmp(s1,s2))
printf("OK1\n");
if (my_strcmp(s1,s2) == strcmp(s1,s2))
printf("OK2\n");
}
my_strcmp.c (just to verify) :
int my_strcmp(const char *s1, const char *s2){
while (*s1 && *s2 && (*s1 == *s2))
{
s1++;
s2++;
}
return (*s1 - *s2);
}
After creating the 3 files my_strcmp.s
, my_strcmp.c
and main.c
,
nasm -f elf64 -g my_strcmp.s -o my_strcmp_assembly.o
gcc -c my_strcmp.c
gcc main.c my_strcmp.o -o c_version.out
gcc main.c my_strcmp_assembly.o -o asm_version.out
./c_version.out && ./asm_version.out
a == b
strcmp return value is -11
ft_strcmp retun value is -11
OK1
OK2
a == b
strcmp return value is -11
ft_strcmp retun value is -11
OK2
gcc version : 11.4.0 (up to date) nasm version : 2.15.05 (up to date)
``
Upvotes: 4
Views: 154
Reputation: 154156
In addition to what @Craig Estey and problems commented elsewhere,return (*s1 - *s2);
is a problem when *s1
and *s2
differ in sign. @Nate Eldredge
strcmp()
performs as if each character was an unsigned char
.
For all functions in this subclause, each character shall be interpreted as if it had the type
unsigned char
(and therefore every possible object representation is valid and has a different value). C2x dr § 7.26.1 4
A better re-invented strmcp()
int my_strcmp(const char *s1, const char *s2){
const unsigned char *uc1 = (const unsigned char *) s1;
const unsigned char *uc2 = (const unsigned char *) s2;
while ((*uc1 == *uc2) && *uc1) { // Do an unsigned access and compare
uc1++;
uc2++;
}
return ((*uc1 > *uc2) - (*uc1 < *uc2)); // No overflow possible
}
Pedantically, this also fixes potential overflow of subtracting 2 int
values that may overflow and avoids (nearly extinct) problems with non-2's complement encoding of a signed char
.
Upvotes: 7
Reputation: 33631
You are using %bl
as a temp register.
%rbx
during your function prolog/epilog or use another register.See: Section 3.2.1 of https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf for a list of registers and how they're used by caller/callee
Upvotes: 10