How to reverse engineer struct details from C source and asm output?

Question

I'm trying to understand the solution to this problem:

Given the C code below, and the asm output from the compiler, what are A and B?

Answer: A is 5, B is 6.

I am guessing there has to be some sort of division done, because 96 and 48 are both divisible by 6 and 20 is divisible by 5.

EDIT: I found this explanation for the answer online. However I am not sure if it is accurate
" a char starts at any BYTE

a short starts only at EVEN bytes

an int starts at BYTE, but divisible by 4

a long starts at BYTE which is divisible by 8

str1.w is long which starts at 5 to 8

str1.x may have 184 or 180

str2.p is int starts at the value 8, hence str1.array which holds from 5 to 8 BYTES

str2.q short may be 14 to 20

str2.z may be 32

char w[A][B] and int X

8 184

Str2.

short[B] int p doublez[B] short q

20 4 8 9

hence the value of A=5 and B=6"

Code below:

// #define A  ??   // 5
// #define B  ??   // 6, but the question is how to figure that out from the asm
typedef struct {
    char w[A][B];
    int x;
} str1;

typedef struct {
    short y[B];
    int p;
    double z[B];
    short q; 
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

Assembly code generated for doSub procedure:

# t in %rdi, u in %rsi
doSub:
    movswl   96(%rsi), %edx
    movl     20(%rsi), %eax
    subl     %edx, %eax
    movl     %eax, 48(%rdi)
    ret

Peter Cordes · Accepted Answer

The asm is clearly for the AMD64 SysV ABI (more links in the x86 tag wiki). I conclude that from it being x86-64 code with the first two args in %rdi, %rsi. The alignment rules given in the answer you found do match the ABI's rules for struct layout: Those types have their natural alignments. (n-byte types are n-byte aligned, except for 10B long double (x87 format) which is 16B-aligned).

The answer you found doesn't match your C and asm, so the A and B values are different. Sorry I didn't check this while tidying up the question, I just assumed since it's trivial to check the answer with a compiler.

The SO answer you found does indeed have different structs and different asm output, so any similarity in the numeric solution is just a coincidence. Nice work @MichaelPetch for finding the original source (and copying the markdown with formatting into the question).

The following code produces identical asm to what your actual problem, with gcc 5.3 -O3 on the godbolt compiler explorer:

#define A  5
#define B  9
typedef struct {
    char w[A][B];      // stored from 0 to A*B - 1
    int x;             // offset = 48 = A*B padded to a 4B boundary
} str1;

typedef struct {
    short y[B];        // 2*B bytes
    int p;             // offset = 20 = 2*B rounded up to a 4byte boundary
    double z[B];       // starts at 24 (20+4, already 8byte aligned), ends at 24 + 8*B - 1
    short q;           // offset = 96 = 24 + 8 * B
} str2;

void doSub(str1 *t, str2 *u) {
    int v1 = u->p;
    int v2 = u->q;
    t->x = v1-v2;
}

I added in what we know from the asm as comments on the structs.

str2 only depends on B, and has no ambiguity, so we can solve for B before worrying about A:

96 = 24 + 8 * B
72 = 8 * B
72/8 = 9 = B
Once we have B, str1 gives us A:

48 = align4(A*B) = align4(A*9)
45 <= A*9 <= 48
5 <= A <= 5.333
Only one integer solution: A == 5

Although honestly it was faster to solve by trial and error, since the compiler explorer site re-compiles automatically after any change. It was easy to iterate towards the right value for B to produce the 96 and 20 offsets.

Your A was already correct, but homing in on that would have been easy, since the problem was separable. There was never a 2 simultaneous equations in 2 unknowns situation.

This is where the "solution" starts to wander off track. Are you sure it was a solution to the exact same problem you posted?

str1.w is long which starts at 5 to 8
str1.x may have 184 or 180

str1.w in the code you posted is a 2-dimensional array of char, and starts at the beginning of the struct.

str1.x starts at 48 bytes into str1, as we can see from the asm.

How to reverse engineer struct details from C source and asm output?

Answers (2)

Related Questions