Reputation: 41
my apologies for the rookie question. I am diving into assembly a bit more and I am trying to use some inline asm on a small library I wrote to do linear algebra. Everything works fine but I am having issues increasing the address pointed by the pointers. I need this step to be able to multiply elements of a matrix.
void multiply(int n, int m, int* ptrM1, int* ptrM2, int* result)
{
int counter = n*n*m;
int index = 1;
int inter = 0;
while (index != (counter+1))
{
asm
(
"movl %2, %%eax\n\t"
"mull %3\n\t"
"movl %%eax, %0\n\t"
"addl %0, %1\n\t"
"addq $0x04, (%2)\n\t" //<-----Problematic lines
"addq $0x04, (%3)\n\t" //<-----Problematic lines
:"+c"(inter),"+b"(*result)
:"r"(*ptrM1), "r"(*ptrM2)
:"%eax"
);
printf("%d ", *result);
//++ptrM1; //Want to do this in assembly
//++ptrM2; //Same
if ((index % m ) == 0){++result;}
++index;
}
}
The program compiles as it is, but I get a core dumped (segmentation fault) error when I try to run it. I suspect it has to do with the syntax I am using in the lines I commented above. I am also suspicious about the constraints I gave to the two variables since they both perform as input and output but are only declared as inputs. The thing is, I literally copied how gcc handled the "++pointer_address" in the disassembled code. Anybody that can help me out? It would be much appreciated.
EDIT: Excellent tips in the comments so far, but still don't fix the problem. Here is how I implemented them:
void multiply(int n, int m, int* ptrM1, int* ptrM2, int* result)
{
int counter = n*n*m;
int index = 1;
int inter = 0;
while (index != (counter+1))
{
asm
(
"movl %4, %%eax\n\t"
"mull %5\n\t"
"movl %%eax, %0\n\t"
"addl %0, %1\n\t"
"addq $0x04, %2\n\t" //Problematic lines
"addq $0x04, %3\n\t" //problematic lines
:"+c"(inter),"+b"(*result), "+rm"(ptrM1), "+rm"(ptrM2)
:"r"(*ptrM1), "r"(*ptrM2)
:"%eax"
);
printf("%d ", *result);
//++ptrM1; //Want to do this in assemlby
//++ptrM2; //Same
if ((index % m ) == 0){++result;}
++index;
}
}
EDIT 2: And this is the original function written in C, for comparison.
void multiply(int n, int m, int* ptrM1, int* ptrM2, int* result)
{
int counter = n*n*m;
int index = 1;
int inter = 0;
while (index != (counter+1))
{
if ((index % m ) == 0)
{
inter = *ptrM1 * *ptrM2;
*result += inter;
++ptrM1;
++ptrM2;
++result;
}
else
{
inter = *ptrM1 * *ptrM2;
*result += inter;
++ptrM1;
++ptrM2;
}
++index;
}
}
Upvotes: 1
Views: 189
Reputation: 41
So thanks to everybody who commented. The final working function is the following:
void multiply(int n, int m, int* ptrM1, int* ptrM2, int* result)
{
int counter = n*n*m;
int index = 1;
int inter = 0;
while (index != (counter+1))
{
asm
(
"movl %4, %%eax\n\t"
"mull %5\n\t"
"movl %%eax, %0\n\t"
"addl %0, %1\n\t"
"addq $0x04, %2\n\t"
"addq $0x04, %3\n\t"
:"+c"(inter),"+b"(*result), "+rm"(ptrM1), "+rm"(ptrM2)
:"r"(*ptrM1), "r"(*ptrM2)
:"%eax", "%edx"
);
if ((index % m ) == 0){++result;}
++index;
}
}
So to summarize, I was both using the pointer wrong and not clobbering a cluttered register. The solution consisted in adding the de-deferenced pointers into the outputs with "+rm" constrains, and clubbering both eax
and edx
since mull
will store a 64 bit result in edx:eax
.
Upvotes: 1