Reputation: 4094
I want to obfuscate a particular string in the binary of a C program to make it harder to analyze. I know this will not prevent someone from seeing the string if running it in a debugger. Yes, this is merely obfuscation.
Every instance of obfuscation triggers a discussion saying it has no value whatsoever. So did this one! I am aware that a capable and determined attacker will be able to recover the string. For the sake of the argument let's say I'm writing a game for X year olds and the string to be hidden is a URL to be called only once they beat the game and their name will be added to the hall of fame. It's reasonable to assume that most X year olds will not have skills that go beyond opening the binary file in a hex editor. Thanks!
Is there some elegant way to do the hiding at compile time, perhaps using the C preprocessor and a macro?
What i have seen so far is a suggestion by Yuri Slobodyanyuk resulting in this:
#define HIDE_LETTER(a) (a) + 0x50
#define UNHIDE_STRING(str) do { char * ptr = str ; while (*ptr) *ptr++ -= 0x50; } while(0)
...
char str1[] = { HIDE_LETTER('s'), HIDE_LETTER('e'), HIDE_LETTER('c'), HIDE_LETTER('r'), HIDE_LETTER('e'),
HIDE_LETTER('t'), '\0' };
UNHIDE_STRING(str1); // unmangle the string in-place
It works but it's a bit ugly. 🙂 Perhaps someone knows a better solution?
I'm fine with something that is gcc specific.
PS: For C++ there is a solution by Adam Yaxley on github but I'm looking for C, not C++. And there's a solution with a little helper program at https://github.com/TwizzyIndy/hkteam_obfuscator
Upvotes: 1
Views: 2987
Reputation: 4094
I changed the obfuscation to just flip bit 7. Also i couldn't find a pretty way to do the encoding in the C preprocessor cpp at compile time. I ended up encoding the string using this shell onliner
tr \\000-\\377 \\200-\\377\\0-\\177|od -t x1 -A none|sed -e 's/ /\\x/g'
or this Powershell oneliner:
[System.Text.Encoding]::UTF8.GetBytes((Read-Host)) | ForEach-Object { if ($_ -lt 128) { ($_ -bor 0x80) } else { ($_ -band 0x7F) } } | ForEach-Object { '\x{0:X2}' -f $_ } | Write-Host -NoNewline
and sticking the result into the C source:
#include <stdio.h>
#include <string.h>
/* flip bit 7 in string using shell commands
tr \\000-\\377 \\200-\\377\\0-\\177|od -t x1 -A none|sed -e 's/ /\\x/g'
*/
int main() {
char secret[] = "\xce\xef\xf4\xa0\xf5\xf3\xe9\xee\xe7\xa0\xf4\xe8"
"\xe5\xa0\xf0\xf2\xe5\xf0\xf2\xef\xe3\xe5\xf3\xf3\xef\xf2"
"\xa0\xba\xad\xa8";
for (int i = 0; secret[i]; i++)
secret[i] ^= 1 << 7; // flip bit 7
printf("%s\n",secret);
}
I will leave this question as unanswered for now in the hope that someone finds a one-step solution instead of this two-step approach.
Upvotes: 0
Reputation: 47942
How about this:
#define STRING "Obfuscated"
#define Makestr(i) string[i] = STRING[i]
char string[11];
Makestr(6); Makestr(5);
Makestr(9); Makestr(7);
Makestr(0); Makestr(3);
Makestr(2); Makestr(4);
Makestr(1); Makestr(8);
Makestr(10);
This will typically compile to the equivalent of
string[6] = 97; string[5] = 99;
string[9] = 100; string[7] = 116;
string[0] = 79; string[3] = 117;
string[2] = 102; string[4] = 115;
string[1] = 98; string[8] = 101;
string[10] = 0;
If you look at the object file using strings
or a hex editor, it won't even be obvious that there's a string at all. (But if you step through the code in a debugger, you'd be able to suss out what it was doing soon enough. No way around that, really.)
You could also perturb the individual characters, as in your original question:
#define Makestr(i) string[i] = STRING[i] + 0x50
Me, I'd worry about overflow, so I'd probably do
#define Makestr(i) string[i] = STRING[i] ^ 0x55
Now you get the equivalent of
string[6] = 177;
or
string[6] = 52;
, etc.
Obviously in these cases you have to additionally unhide the constructed string at run time, of course.
With clang I had to use -O
to force it to collapse the constants and not emit the original string in the object file; with gcc it worked right away.
If your string is longer, the randomly-shuffled sequence of Makestr
calls could get pretty unwieldy, though.
Upvotes: 2
Reputation: 1
First, be aware that your issue is probably better covered by some legal approach (a contract reviewed by a paid lawyer) than by technical means.
Your approach is similar to Caesar cypher (which has been broken thousands of years ago: insight: compute frequencies of letters; in human English, e
is the most frequent one). Even the German Enigma machine did a lot better in WW2. Read about the works of Alan Turing during WW2 (his team broke the Enigma machine encryption).
Is there some elegant way to do it at compile time, perhaps using the C preprocessor and a macro?
(and mathematical proofs of that exist in the literature, covered by books related to Frama-C or cybersecurity or Coq proof assistant; be aware of Rice's theorem; Read also Berto-Caseran book on Interactive Theorem Proving and Software Development ISBN 3-540-20854-2)
The argument of such a proof is based on cardinality. You could also use a probabilistic approach: store in your program some cryptic hashcode (e.g. computed by crypt(3) at build time) and ask from user input a secret key, etc...
Any professional hacker will be technically able (perhaps after weeks of work) to find your "secret" string. Or colleagues working on or with BinSec.
However, you could write some metaprogram generating your obfuscated string as C code (to be #include
-d at compile time), and add into your program some deobfuscation routine.
I'm fine with something that is gcc specific.
On large programs, consider developing your GCC plugin (perhaps starting with Bismon). See also the DECODER project.
Be however aware of Rice's theorem. Read about P vs NP problem.
Consider also generating some C code (maybe some #include
-d header) with tools like GPP.
Code obfuscation is a topic which has conferences. Did you attend any of them? Many papers exist in ACM conferences.
There could be also legal issues (perhaps related to GDPR). You should contact your lawyer. In France, see article 323 du Code Pénal.
If your code runs on a computer connected to the Internet and interacting with a user, consider a SaaS approach: you could ask some money with a VISA card at every run (or once a month).... Your bank will sell you appropriate software and permissions.
I'm writing a game for 8 year olds and the string to be hidden is a URL to be called only once they beat the game and their name will be added to the hall of fame. It's reasonable to assume that most 8 year olds will not have skills that go beyond opening the binary file in a hex editor.
I now no 8 years old kid able to do that, and those who do deserves to be added to your hall of fame. If indeed you are coding a game, I recommend putting the URL as clear text.
NB. The old XPM program could be inspirational, and so can be RefPerSys and Jacques Pitrat's last book Artificial Beings, the conscience of a conscious machine (ISBN-13: 978-1848211018). Feel free to contact me by email [email protected]
(home) or [email protected]
(office, at CEA LIST) for more.
PS. Consider of course starting your PhD on that topic! In France, at ENS or Ecole Polytechnique. There are interesting related talks at College de France. In Germany, Frauhaufer CyberSecurity lab. Probably, the Bundeswehr will fund your research in Germany (but I have no connections there), and also ITEA4. Of course, you will spend three or four years full-time to find a good enough solution. Please publish papers on arxiv.
Upvotes: 3