Reputation: 148
i try to make an assembly code that count how many characters is in the string, but i get an error.
Code, I use gcc and intel_syntax
#include <stdio.h>
int main(){
char *s = "aqr b qabxx xryc pqr";
int x;
asm volatile (
".intel_syntax noprefix;"
"mov eax, %1;"
"xor ebx,ebx;"
"loop:"
"mov al,[eax];"
"or al, al;"
"jz print;"
"inc ebx;"
"jmp loop"
"print:"
"mov %0, ebx;"
".att_syntax prefix;"
: "=r" (x)
: "r" (s)
: "eax", "ebx"
);
printf("Length of string: %d\n", x);
return 0;
}
And i got error:
Error: invalid use of register
Finally I want to make program, which search for regex pattern([pq][^a]+a) and prints it's start position and length. I wrote it in C, but I have to make it work in assembly: My C code:
#include <stdio.h>
#include <string.h>
int main(){
char *s = "aqr b qabxx xryc pqr";
int y,i;
int x=-1,length=0, pos = 0;
int len = strlen(s);
for(i=0; i<len;i++){
if((s[i] == 'p' || s[i] == 'q') && length<=0){
pos = i;
length++;
continue;
} else if((s[i] != 'a')) && pos>0){
length++;
} else if((s[i] == 'a') && pos>0){
length++;
if(y < length) {
y=length;
length = 0;
x = pos;
pos = 0;
}
else
length = 0;
pos = 0;
}
}
printf("position: %d, length: %d", x, y);
return 0;
}
Upvotes: 0
Views: 2122
Reputation: 7483
You omitted the semicolon after jmp loop
and print:
.
Also your asm isn't going to work correctly. You move the pointer to s
into eax, but then you overwrite it with mov al,[eax]
. So the next pass thru the loop, eax doesn't point to the string anymore.
And when you fix that, you need to think about the fact that each pass thru the loop needs to change eax to point to the next character, otherwise mov al,[eax]
keeps reading the same character.
Since you haven't accepted an answer yet (by clicking the checkmark to the left), there's still time for one more edit.
Normally I don't "do people's homework", but it's been a few days. Presumably the due date for the assignment has passed. Such being the case, here are a few solutions, both for the education of the OP and for future SO users:
1) Following the (somewhat odd) limitations of the assignment:
asm volatile (
".intel_syntax noprefix;"
"mov eax, %1;"
"xor ebx,ebx;"
"cmp byte ptr[eax], 0;"
"jz print;"
"loop:"
"inc ebx;"
"inc eax;"
"cmp byte ptr[eax], 0;"
"jnz loop;"
"print:"
"mov %0, ebx;"
".att_syntax prefix;"
: "=r" (x)
: "r" (s)
: "eax", "ebx"
);
2) Violating some of the assignment rules to make slightly better code:
asm (
"\n.intel_syntax noprefix\n\t"
"mov eax, %1\n\t"
"xor %0,%0\n\t"
"cmp byte ptr[eax], 0\n\t"
"jz print\n"
"loop:\n\t"
"inc %0\n\t"
"inc eax\n\t"
"cmp byte ptr[eax], 0\n\t"
"jnz loop\n"
"print:\n"
".att_syntax prefix"
: "=r" (x)
: "r" (s)
: "eax", "cc", "memory"
);
This uses 1 fewer register (no ebx
) and omits the (unnecessary) volatile
qualifier. It also adds the "cc" clobber to indicate that the code modifies the flags, and uses the "memory" clobber to ensure that any 'pending' writes to s
get flushed to memory before executing the asm. It also uses formatting (\n\t) so the output from building with -S
is readable.
3) Advanced version which uses even fewer registers (no eax
), checks to ensure that s
is not NULL (returns -1), uses symbolic names and assumes -masm=intel
which results in more readable code:
__asm__ (
"test %[string], %[string]\n\t"
"jz print\n"
"loop:\n\t"
"inc %[length]\n\t"
"cmp byte ptr[%[string] + %[length]], 0\n\t"
"jnz loop\n"
"print:"
: [length] "=r" (x)
: [string] "r" (s), "[length]" (-1)
: "cc", "memory"
);
Getting rid of the (arbitrary and not well thought out) assignment constraints allows us to reduce this to 7 lines (5 if we don't check for NULL, 3 if we don't count labels [which aren't actually instructions]).
There are ways to improve this even further (using %=
on the labels to avoid possible duplicate symbol issues, using local labels (.L
), even writing it so it works for both -masm=intel
and -masm=att
, etc.), but I daresay that any of these 3 are better than the code in the original question.
Well Kuba, I'm not sure what more you are after here before you'll accept an answer. Still, it does give me the chance to include Peter's version.
4) Pointer increment:
__asm__ (
"cmp byte ptr[%[string]], 0\n\t"
"jz .Lprint%=\n"
".Loop%=:\n\t"
"inc %[length]\n\t"
"cmp byte ptr[%[length]], 0\n\t"
"jnz .Loop%=\n"
".Lprint%=:\n\t"
"sub %[length], %[string]"
: [length] "=&r" (x)
: [string] "r" (s), "[length]" (s)
: "cc", "memory"
);
This does not do the 'NULL pointer' check from #3, but it does do the 'pointer increment' that Peter was recommending. It also avoids potential duplicate symbols (using %=
), and uses 'local' labels (ones that start with .L
) to avoid extra symbols getting written to the object file.
From a "performance" point of view, this might be slightly better (I haven't timed it). However from a "school project" point of view, the clarity of #3 seems like it would be a better choice. From a "what would I write in the real world if for some bizarre reason I HAD to write this in asm instead of just using a standard c function" point of view, I'd probably look at usage, and unless this was performance critical, I'd be tempted to go with #3 in order to ease future maintenance.
Upvotes: 1