bot47
bot47

Reputation: 1514

Identifying Algorithms in Binaries

Does anyone of you know a technique to identify algorithms in already compiled files, e.g. by testing the disassembly for some patterns?

The rare information I have are that there is some (not exported) code in a library that decompresses the content of a Byte[], but I have no clue how that works. I have some files which I believe to be compressed in that unknown way, and it looks as if the files come without any compression header or trailer. I assume there's no encryption, but as long as I don't know how to decompress, its worth nothing to me.

The library I have is an ARM9 binary for low capacity targets.

EDIT: It's a lossless compression, storing binary data or plain text.

Upvotes: 3

Views: 1931

Answers (5)

Gareth Rees
Gareth Rees

Reputation: 65854

The reliable way to do this is to disassemble the library and read the resulting assembly code for the decompression routine (and perhaps step through it in a debugger) to see exactly what it is doing.

However, you might be able to look at the magic number for the compressed file and so figure out what kind of compression was used. If it's compressed with DEFLATE, for example, the first two bytes will be hexadecimal 78 9c; if with bzip2, 42 5a; if with gzip, 1f 8b.

Upvotes: 2

Brian
Brian

Reputation: 25824

Reverse engineering done by viewing the assembly may have copyright issues. In particular, doing this to write a program for decompressing is almost as bad, from a copyright standpoint, as just using the assembly yourself. But the latter is much easier. So, if your motivation is just to be able to write your own decompression utility, you might be better off just porting the assembly you have.

Upvotes: 0

r0u1i
r0u1i

Reputation: 3566

From my experience, most of the times the files are compressed using plain old Deflate. You can try using zlib to open them, starting from different offset to compensate for custom headers. Problem is, zlib itself adds its own header. In python (and I guess other implementations has that feature as well), you can pass to zlib.decompress -15 as the history buffer size (i.e. zlib.decompress(data,-15)), which cause it to decompress raw deflated data, without zlib's headers.

Upvotes: 0

sfossen
sfossen

Reputation: 4778

You could go a couple directions, static analysis with something like IDA Pro, or load into GDB or an emulator and follow the code that way. They may be XOR'ing the data to hide the algorithm, since there are already many good loss less compression techniques.

Upvotes: 3

Adam Davis
Adam Davis

Reputation: 93565

Decompression algorithms involve significantly looping in tight loops. You might first start looking for loops (decrement register, jump backwards if not 0).

Given that it's a small target, you have a good chance of decoding it by hand, though it looks hard now once you dive into it you'll find that you can identify various programming structures yourself.

You might also consider decompiling it to a higher level language, which would be easier than assembly, though still hard if you don't know how it was compiled.

http://www.google.com/search?q=arm%20decompiler

-Adam

Upvotes: 2

Related Questions