CIsForCookies
CIsForCookies

Reputation: 12817

doing seemingly un-needed ops (crackme)

 80483ed:       b8 00 00 00 00          mov    $0x0,%eax                                                   │~                                                                                                       
 80483f2:       83 c0 0f                add    $0xf,%eax                                                   │~                                                                                                       
 80483f5:       83 c0 0f                add    $0xf,%eax                                                   │~                                                                                                       
 80483f8:       c1 e8 04                shr    $0x4,%eax                                                   │~                                                                                                       
 80483fb:       c1 e0 04                shl    $0x4,%eax                                                   │~                                                                                                       
 80483fe:       29 c4                   sub    %eax,%esp 

This is an assembly code snippet from the start of a main function of a crackme binary I objdump -ded. The eax manipulation is very odd to me:

 1. eax = 0
 2. eax += 0xf
 3. eax += 0xf // eax = 0x1e (30 decimal, 11110 in binary)
 4. eax >>= 4 // eax = 1
 5. eax <<= 4 // eax = 16 (0x10)

Is this some kind of a fast way of manipulating eax that is good for some reason? Or is this just a confusing C code that was compiled without optimization in order to throw off the person trying to RE it?

Upvotes: 0

Views: 160

Answers (2)

Margaret Bloom
Margaret Bloom

Reputation: 44046

You just fall victim to a mild form of obfuscation, specifically done to slow done the reverse engineering of the program.

Take this code for example:

An example of obfuscation

It's from a real-world example: a VB61 packer used to deliver a malware (I don't remember which one, I think it was Gootkit).
In this specific screenshot, all the instructions are useless, but in the whole code you'll find a push <constant> and pop <reg> here and there - a silly way of doing mov <reg>, <constant>.

That's just to slow down the analyst (and possibly throw off beginners).
As long as it's easy, you can translate the code in you mind but you may want to consider more sophisticated tools (like IDA or radare2) that allow you to comment and manipulate the code.

As the crackme difficulty increases, you should expect more obfuscation and tricks.


1This kind of packer ends up calling native code generated outside the VB6 compiler.

Upvotes: 4

Peter Cordes
Peter Cordes

Reputation: 364163

A compiler would never emit this with optimization enabled, it's clearly inefficient and written by hand as an exercise.

There's no plausible way a compiler made this asm even without optimization. Multiple additions within one expression would collapse to a single add at compile time. Across separate statements, it would store/reload to memory. (Except with register unsigned tmp; for GCC).

Subtracting it from the stack pointer means this would have to be in an alloca or a C99 VLA like char buf[tmp].

>>=4 / <<=4 is not how GCC or clang make sure the alloca size is a multiple of 16: with optimization disabled: GCC uses an insane div and imul even though the size is a power of 2, clang uses a normal (a + 15) & -16.

The 2nd add $0xf, %eax combined with the 2 shifts to knock off the low 4 bits does actually implement that (size+15) & -16 calculation to round the allocation size up to the next multiple of 16. (Keeping the stack aligned, and thus also the allocation itself).

So it could be a correct implementation of the following source (with optimization disabled), but it's implausible because any sane compiler would know to use and $0xfffffff0, %eax to clear the low bits instead of 2 shifts.

int foo(void) {
    register unsigned a asm("eax")= 0;  // otherwise GCC picks a call-preserved reg, EBX
    //register unsigned a = 0;
    a += 0xf;
    a += 0xf;
    a >>= 4;       // include this manually instead of as part of alloca / VLA size calc
    a <<= 4;
    volatile char buf[a];
    buf[0] = 0;
    return buf[0];
}

This does get us two back-to-back add instructions, because GCC still compiles every statement to a separate block of asm (for consistent debugging even if you used jump in GDB to jump between source lines.) See Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

https://godbolt.org/z/x8W6d9 - GCC10.2 -O0 -m32 -Wall output contains some of your sequence, but not the sub %eax, %esp right after the shift

foo:
        push    ebp
        mov     ebp, esp
        push    ebx
        sub     esp, 20
        mov     eax, esp
        mov     ecx, eax

        mov     eax, 0                 # sequence starts here
        add     eax, 15
        add     eax, 15
        shr     eax, 4
        sal     eax, 4                 # sal is a synonym for the same opcode as shl.  Disassembly would normally show shl
            # but that's as far as we can get
            # VLA size calculation to align the VLA by 16, and the stack, not just sub from ESP.
        mov     edx, eax
        sub     edx, 1
        mov     DWORD PTR [ebp-12], edx
        mov     edx, 16
        sub     edx, 1
        add     eax, edx
        mov     ebx, 16
        mov     edx, 0
        div     ebx                   # yes really, GCC -O0 emits a div for a constant 16
        imul    eax, eax, 16
        sub     esp, eax
        mov     eax, esp
        add     eax, 0
        mov     DWORD PTR [ebp-16], eax
        mov     eax, DWORD PTR [ebp-16]
        mov     BYTE PTR [eax], 0
        mov     eax, DWORD PTR [ebp-16]
        movzx   eax, BYTE PTR [eax]
        movsx   eax, al                    # should have done a movsx load in the first place
        mov     esp, ecx                   # pointless; saved EBX addressed relative to EBP
        mov     ebx, DWORD PTR [ebp-4]
        leave
        ret

clang chooses to just ignore the register keyword, keeping a in memory between statements.

Upvotes: 3

Related Questions