Reputation: 505
I am working on reverse engineering some code and am coming across some code that doesn't appear to make sense
For example
push 0x3C2D1C06
mov [esp], ebx
mov[esp], edx
mov dword ptr[esp], 0x7D0B46E7`
Is there logic behind this or is this something that comes from compiler generation. It occurs to me the this would net the same as
push 0x7D0B46E7
Another example is
pusha
push ecx
pop ecx
pusha
popa
popa
push eax
push edx
rdtsc
pusha
popa
pop edx
pop eax
I dont understand all of the push/pop combinations
EDIT:
I am really intrigued now on the root of this. I do have to believe that this is one of a few sources:
1.) As mentioned possible that this was done to obscure reverse engineering
2.) Some form of optimization that is over my head(I have a hard time seeing this one)
3.) Some form of compiler action linking possibly multi platform libraries or something
I am convinced that the code is disassembled correctly because the sequences may be odd the do not ever corrupt the stack etc. I am curious about some of the sequences such as
push edi
push eax
push eax
mov eax, 0x577D25CD
mov [esp], eax
push eax
pop edi
and edi, 0x3F7BEBDC
shl edi, 0x3
xor edi, 0x6E778451
sub edi, 0x0D5BE8A31
mov ebx, edi
push [esp]
pop edi
To start with
push edi
push eax
push eax
mov eax, 0x577D25CD
mov [esp], eax
pop eax
pop edi
I believe is the same as writing
push edi
push 0x577D25CD
pop edi
OR even
push edi
mov edi, 0x577D25CD
But even more confusing is what follows,
and edi(0x577D25CD), 0x3F7BEBDC ->edi=0x177921CC
shl edi(0x177921CC), 3 ->edi=0xBBC90E60
xor edi(0xBBC90E0), 0x6e778451 -> edi=0xD5BE8A31
sub edi(0x0D5BE8A31), 0x0D5BE8A31 -> edi=0x0
mov ebx, edi
So that could be replaced with
mov ebx, 0x0
OR xor ebx, ebx
So this leaves me with a few new questions. Is this point more firmly to a form of reverse engineering protection, are there tools that do this. Could there be any other logical reasons to execute a stream like this. Is it fair to assume that a compiler would not perform operations to set values below ESP, I have not found indications that any of the values below ESP are used
A couple last points, I have started distilling the code down to just the effective result and so far everything thing works as before and i've removed over 200bytes of code. This further points me to the idea that I'm correctly interpreting the code. I also find it hard to believe incorrectly disassembled code would produce a sequence of xor, and, shl operations that exactly equal 0x00
EDIT 2:
I can assuredly say this is code I am interpreting. I focused in on one section, I posted a portion of it in my first edit and when I sorted through the 86 asm instructions I found the result was simply this
start:
test ebx, ebx
jz end
add [eax], ecx
add eax, 0x04
sub ebx, 0x1
jmp start
end:
The rest was as mentioned before "clobbering" the stack with odd combinations of pushes, pops, xchgs, and arithmetic on the esp. Sometimes there was something like add esp, 0x4
then mov [esp], eax
which is the same as push eax
In the end the section I worked through was offsetting an array of addresses by the Image_Base, a form of relocation I guess. When I peek the relocated addresses they point to code segments and what looks like the start of procedures. I have a hard time seeing someone writing the thousands of lines of code that appear to do this which makes me wonder..
Does anyone know of a tool that does this. Has someone created a program that takes code and obscures it with nonsense to prevent reverse engineering.
Upvotes: 0
Views: 343
Reputation: 365757
The most plausible source of these instruction-sequences is that they're hand-crafted as obfuscation, possibly in inline-asm.
As I commented earlier, if that's the case then this code is working exactly as intended: it's confused / puzzled you and caused you to spend significant time on it instead of just reverse-engineering the actual program logic.
The fact that none of these sequences crash or mess up the stack is a clear sign that it's definitely not accidental execution of data as code.
Your test of actually setting breakpoints in this code to see that it was actually executed was useful here, too. It's not uncommon for disassembly output to include some data stored in code sections. But seeing that it is executed, we know it's not just data.
2.) Some form of optimization that is over my head(I have a hard time seeing this one)
You are right to be skeptical of this idea, but yes you always need to consider that possibility.
Sometimes sub-optimal code exists because someone thought it would be fast, at least on some specific CPU. But that's not even plausible here. I can't believe anyone would think these long convoluted sequences would be faster on any x86 CPU than xor ebx,ebx
, push imm32
, or mov r32, imm32
. Those instructions are all fast on their own, and the sequence that's equivalent to push imm32
actually uses a push imm32
+ some dead stores, so it's clearly not trying to avoid push imm32
for some hypothetical CPU where that instruction is slow.
Especially in the NOP case, zero instructions is always better. If you need padding, Intel and AMD publish recommended long-NOP sequences that have minimal overhead. pusha
/ popa
are slow on everything, and it's pretty obvious they do a lot of work, so it doesn't make sense that someone would intentionally use them as padding.
Other than obfuscation, the other plausible purpose is intentional delay. That's sort of a performance reason, but if timing is part of the correctness of your program, fixed sequences like this are not a reliable way to achieve it across a range of different x86 microarchitectures!
But I could believe that some / all of these were a bad implementation of "waste several cycles here". Especially pusha
/ popa
takes a lot of time in few instruction bytes.
Does anyone know of a tool that does this. Has someone created a program that takes code and obscures it with nonsense to prevent reverse engineering.
Good question. Inserting crazy NOPs or turning push imm32
or xor
-zeroing into obfuscated sequences is something a tool could maybe do automatically. But usually you'd want to avoid that inside any loops that are actually important for overall performance, so IDK.
Upvotes: 1