Reputation: 13
I need to write a script that will extract all the data out of a client's files without having to launch the application they were created in. The application saves all data to an XML file, encrypts only the content said file before it compresses it, and finally changes the extension of the compressed file to make it "more difficult" to recognize.
The application has a text viewer that will show part, but not all, of the data in the client file, making my coworkers and I have to do a combination of copying that text and tabbing through the fields in all the other areas we need to extract the data from. Attempting to use the application itself is slow and practically useless, as it has horrendous memory issues, causing it to crash constantly.
Anyway - I am able to figure out some of the basics because the encryption seems fairly weak, or at least it's easy to see the pattern it uses. Each client file from each customer on each machine I am going to be using is going to have the same output for the same characters. The encryption is the same between all files.
It changes blocks of three characters (adding a character to the beginning of each block of three), restarting on the fourth. =
appears to be a null character.
For example: A
becomes QQ==
, AA
is QUE=
, AAA
is QUFB
, and AAAA
is QUFBQQ==
I've found the basics, it's just a list of all the Unicode characters that are changed, for example QQ==
would be A
, Qg==
is B
, Qc==
is C
, we move to RA==
for D
, RQ==
for E
, and cycle onwards through the character table.
It starts to get fun when we introduce a second and/or third character to the string, as it now has ITA=
for !0
, and moves forward four alphanumeric characters like so: ITE=
for !1
, ITY=
for !6
, and rolling back to ITc=
for !7
, up to IT8=
for !>
before moving to the next character in the second position and starting over like so IUA=
for !?
. And so on.
Anyway, I would love some pointers on a few things here. How do I take what I know and find the algorithm? And from there, how do I use that to decrypt the rest of the data?
Upvotes: 1
Views: 1278
Reputation: 86744
That's not encryption, that's BASE64 encoding. It is a method of encoding a binary data stream using only 64 printable ASCII characters. It is used when sending binary data over a communications channel that may not correctly handle binary data (i.e. a binary file as an email attachment).
Every 3 input characters (24 bits) are encoded in 4 output characters where only 64 possibilities are used (6 effective bits per character * 4 = 24 bits). The trailing =
signs are padding.
There are library methods for converting to and from BASE64 in just about all major languages, even XSLT (Google "XSLT base64").
There's even a website (actually several) to convert to and from BASE64.
Upvotes: 2