Megan Darcy
Megan Darcy

Reputation: 582

how to derive the types of the following data types from assembly code

I came across an exercise, as I am still trying to familiarise myself with assembly code.

I am unsure how to derive the types for a given struct, given the assembly code and the skeleton c code. Could someone teach me how this should be done?

This is the assembly code, where rcx and rdx hold the arguments i and j respectively.

randFunc:
    movslq  %ecx,%rcx // move i into rcx
    movslq  %edx, %rdx // move j into rdx
    leaq    (%rcx,%rcx,2), %rax //3i into rax
    leaq    (%rdx,%rdx,2), %rdx // 3j into rdx
    salq    $5, %rax            // shift arith left 32? so 32*3i = 96i
    leaq    (%rax,%rdx,8), %rax //24j + 96i into rax
    leaq    matrixtotest(%rip), %rdx //store address of the matrixtotest in rdx
    addq    %rax, %rdx              //jump to 24th row, 6th column variable 
    cmpb    $10, 2(%rdx)       //add 2 to that variable and compare to 10
    jg      .L5                //if greater than 10 then go to l5
    movq    8(%rdx), %rax     // else add 8 to the rdx number and store in rax
    movzwl  (%rdx), %edx      //move the val in rdx (unsigned) to edx as an int   
    subl    %edx,   %eax      //take (val+8) -(val) = 8? (not sure) 
    ret
.L5 
    movl    16(%rdx),%eax    //move 1 row down and return? not sure about this
    ret
    

This is the C code:

struct mat{
        typeA a;
        typeB b;
        typeC c;
        typeD d;
    }

struct mat matrixtotest[M][N];
int randFunc(int i, int j){
    return __1__? __2__ : __3__;
}

How do I derive the types of the variables a,b,c,d? And what is happening in the 1) 2) 3) parts of the return statement ?

Please help me, I'm very confused about what's happening and how to derive the types of the struct from this assembly.

Any help is appreciated, thank you.

Upvotes: 0

Views: 356

Answers (2)

Erik Eidt
Erik Eidt

Reputation: 26656

How do I derive the types of the variables a,b,c,d?

You want to see how variables are used, which will give you a very strong indication as to their size & sign.  (These indications are not always perfect, but the best we can do with limited information, i.e. missing source code, and will suffice for your exercise.)

So, just work the code, one instruction after another to see what they do by the definitions they have in the assembler and their mapping to the instruction set, paying particular attention to the sizes, signs, and offsets specified by the instructions.

Let's start for example with the first instruction: movslq ecx, rcx — this is saying that the first parameter (which is found in ecx), is a 32-bit signed number.

Since rcx is Windows ABI first parameter, and the assembly code is asking for ecx to be sign extended into rcx, then we know that this parameter is a signed 32-bit integer.  And you proceed to the next instruction, to glean what you can from it — and so on.

And what is happening in the 1) 2) 3) parts of the return statement ?

The ?: operator is a ternary operator known as a conditional.  If the condition, placeholder __1__, is true, it will choose the __2__ value and if false it will choose __3__.  This is usually (but not always) organized as an if-then-else branching pattern, where the then-part represents placeholder __2__ and the else part placeholder __3__.

That if-then-else branching pattern looks something like this in assembly/machine code:

    if <condition> /* here __1__ */ is false goto elsePart;
    <then-part> // here __2__
    goto ifDone;
elsePart:
    <else-part> // here __3__
ifDone:

So, when you get to an if-then-else construct, you can fit that into the ternary operator place holders.

That code is nicely commented, but somewhat absent size, sign, and offset information.  So, following along and derive that missing information from the way the instructions tell the CPU what sizes, signs, and offsets to use.

As Jester describes, if the code indexes into the array, because it is two-dimensional, it uses two indexes.  The indexing takes the given indexes and computes the address of the element.  As such, the first index finds the row, and so must skip ahead one row for each value in the index.  The second index must skip ahead one element for each value in the index.  Thus, by the formula in the comments: 24j + 96i, we can say that a row is 96 bytes long and an element (the struct) is 24 bytes long.

Upvotes: 2

Jester
Jester

Reputation: 58772

Due to the cmpb $10, 2(%rdx) you have a byte sized something at offset 2. Due to the movzwl (%rdx), %edx you have a 2 byte sized unsigned something at offset 0. Due to the movq 8(%rdx), %rax you have a 8 byte sized something at offset 8. Finally due to the movl 16(%rdx),%eax you have a 4 byte sized something at offset 16. Now sizes don't map to types directly, but one possibility would be:

struct mat{
        uint16_t a;
        int8_t b;
        int64_t c;
        int32_t d;
    };

You can use unsigned short, signed char, long, int if you know their sizes.

The size of the structure is 24 bytes, with padding at the end due to alignment requirement of the 8 byte field. From the 96i you can deduce N=4 probably. M is unknown. As such 24j + 96i accesses item matrixtotest[i][j]. The rest should be clear.

Upvotes: 2

Related Questions