Reputation: 582
I came across an exercise, as I am still trying to familiarise myself with assembly code.
I am unsure how to derive the types for a given struct, given the assembly code and the skeleton c code. Could someone teach me how this should be done?
This is the assembly code, where rcx
and rdx
hold the arguments i
and j
respectively.
randFunc:
movslq %ecx,%rcx // move i into rcx
movslq %edx, %rdx // move j into rdx
leaq (%rcx,%rcx,2), %rax //3i into rax
leaq (%rdx,%rdx,2), %rdx // 3j into rdx
salq $5, %rax // shift arith left 32? so 32*3i = 96i
leaq (%rax,%rdx,8), %rax //24j + 96i into rax
leaq matrixtotest(%rip), %rdx //store address of the matrixtotest in rdx
addq %rax, %rdx //jump to 24th row, 6th column variable
cmpb $10, 2(%rdx) //add 2 to that variable and compare to 10
jg .L5 //if greater than 10 then go to l5
movq 8(%rdx), %rax // else add 8 to the rdx number and store in rax
movzwl (%rdx), %edx //move the val in rdx (unsigned) to edx as an int
subl %edx, %eax //take (val+8) -(val) = 8? (not sure)
ret
.L5
movl 16(%rdx),%eax //move 1 row down and return? not sure about this
ret
This is the C code:
struct mat{
typeA a;
typeB b;
typeC c;
typeD d;
}
struct mat matrixtotest[M][N];
int randFunc(int i, int j){
return __1__? __2__ : __3__;
}
How do I derive the types of the variables a,b,c,d
? And what is happening in the 1) 2) 3) parts of the return statement ?
Please help me, I'm very confused about what's happening and how to derive the types of the struct from this assembly.
Any help is appreciated, thank you.
Upvotes: 0
Views: 356
Reputation: 26656
How do I derive the types of the variables a,b,c,d?
You want to see how variables are used, which will give you a very strong indication as to their size & sign. (These indications are not always perfect, but the best we can do with limited information, i.e. missing source code, and will suffice for your exercise.)
So, just work the code, one instruction after another to see what they do by the definitions they have in the assembler and their mapping to the instruction set, paying particular attention to the sizes, signs, and offsets specified by the instructions.
Let's start for example with the first instruction: movslq ecx, rcx
— this is saying that the first parameter (which is found in ecx
), is a 32-bit signed number.
Since rcx is Windows ABI first parameter, and the assembly code is asking for ecx
to be sign extended into rcx
, then we know that this parameter is a signed 32-bit integer. And you proceed to the next instruction, to glean what you can from it — and so on.
And what is happening in the 1) 2) 3) parts of the return statement ?
The ?: operator is a ternary operator known as a conditional. If the condition, placeholder __1__
, is true, it will choose the __2__
value and if false it will choose __3__
. This is usually (but not always) organized as an if-then-else branching pattern, where the then-part represents placeholder __2__
and the else part placeholder __3__
.
That if-then-else branching pattern looks something like this in assembly/machine code:
if <condition> /* here __1__ */ is false goto elsePart;
<then-part> // here __2__
goto ifDone;
elsePart:
<else-part> // here __3__
ifDone:
So, when you get to an if-then-else construct, you can fit that into the ternary operator place holders.
That code is nicely commented, but somewhat absent size, sign, and offset information. So, following along and derive that missing information from the way the instructions tell the CPU what sizes, signs, and offsets to use.
As Jester describes, if the code indexes into the array, because it is two-dimensional, it uses two indexes. The indexing takes the given indexes and computes the address of the element. As such, the first index finds the row, and so must skip ahead one row for each value in the index. The second index must skip ahead one element for each value in the index. Thus, by the formula in the comments: 24j + 96i, we can say that a row is 96 bytes long and an element (the struct) is 24 bytes long.
Upvotes: 2
Reputation: 58772
Due to the cmpb $10, 2(%rdx)
you have a byte sized something at offset 2. Due to the movzwl (%rdx), %edx
you have a 2 byte sized unsigned something at offset 0. Due to the movq 8(%rdx), %rax
you have a 8 byte sized something at offset 8. Finally due to the movl 16(%rdx),%eax
you have a 4 byte sized something at offset 16. Now sizes don't map to types directly, but one possibility would be:
struct mat{
uint16_t a;
int8_t b;
int64_t c;
int32_t d;
};
You can use unsigned short
, signed char
, long
, int
if you know their sizes.
The size of the structure is 24 bytes, with padding at the end due to alignment requirement of the 8 byte field. From the 96i
you can deduce N=4
probably. M
is unknown. As such 24j + 96i
accesses item matrixtotest[i][j]
. The rest should be clear.
Upvotes: 2