Reputation: 3415

How to hide a string in binary code?

Sometimes, it is useful to hide a string from a binary (executable) file. For example, it makes sense to hide encryption keys from binaries.

When I say “hide”, I mean making strings harder to find in the compiled binary.

For example, this code:

const char* encryptionKey = "My strong encryption key";
// Using the key

after compilation produces an executable file with the following in its data section:

4D 79 20 73 74 72 6F 6E-67 20 65 6E 63 72 79 70   |My strong encryp|
74 69 6F 6E 20 6B 65 79                           |tion key        |

You can see that our secret string can be easily found and/or modified.

I could hide the string…

char encryptionKey[30];
int n = 0;
encryptionKey[n++] = 'M';
encryptionKey[n++] = 'y';
encryptionKey[n++] = ' ';
encryptionKey[n++] = 's';
encryptionKey[n++] = 't';
encryptionKey[n++] = 'r';
encryptionKey[n++] = 'o';
encryptionKey[n++] = 'n';
encryptionKey[n++] = 'g';
encryptionKey[n++] = ' ';
encryptionKey[n++] = 'e';
encryptionKey[n++] = 'n';
encryptionKey[n++] = 'c';
encryptionKey[n++] = 'r';
encryptionKey[n++] = 'y';
encryptionKey[n++] = 'p';
encryptionKey[n++] = 't';
encryptionKey[n++] = 'i';
encryptionKey[n++] = 'o';
encryptionKey[n++] = 'n';
encryptionKey[n++] = ' ';
encryptionKey[n++] = 'k';
encryptionKey[n++] = 'e';
encryptionKey[n++] = 'y';

…but it's not a nice method. Any better ideas?

PS: I know that merely hiding secrets doesn't work against a determined attacker, but it's much better than nothing…

Also, I know about assymetric encryption, but it's not acceptable in this case. I am refactoring an existing appication which uses Blowfish encryption and passes encrypted data to the server (the server decrypts the data with the same key).

I can't change the encryption algorithm because I need to provide backward compatibility. I can't even change the encryption key.

Upvotes: 84

Answers (23)

S. Francis

Reputation: 119

As others mentioned obfuscating strings is not a good way to achieve protection. If you do not want a casual examination of the string in the binary executable; then there are a couple of things that I would do.

I would create a large master string (Ms) and embed individual parts of the string to be protected (S1) inside it at different locations in the string (p1,p2,p3... etc) Make those locations quite random. At each location, I will place a small substring of S1. Each substring can be of varying length (l1, l2, etc). Then in the code, I'll dynamically retrieve each part and concatenate all the parts together to reconstruct S1 in code. The code therefore stores two arrays of integers: One for the getting each substring position [p1,p2,p3...], and the second array will explain how many characters to read from each position [l1,l2,l3...].

In such a case the master string (Ms) would be visible, but the curious person will not be able to know how to get back S1 unless the code is reverse engineered, the two arrays retrieved and the same process carried out. Of course the master string (Ms) should be reasonably large and quite random.

In a few cases such strings can be made stronger. In case you have a way to use different strings for the same internal task the string was helping to carry out. For e.g. I may want to dynamically construct a pepper for hashing... I would concatenate the current unix timestamp to the above(s1) and use the s1+timestamp as the string.

Of course, if the string to be protected is a constant then this additional timestamp based approach is not possible.

This was inspired by an incident in the story of Alibaba: One of the 40 thieves had found out Alibaba's house in the village. He used a chalk to write a 'x' on that house door. Fortunately, Alibaba's girlfriend saw it -- and realized the thief will return back with the other 39 and attack the house. So she marked an 'x' in all the houses of the village. That is what made me think, the same thing could be done -- just chop the string to be obfuscated and bury each part in the entire "village" master string.

Hope this is in line with what was asked and it helps. Feedback appreciated.

Upvotes: 0

Alex Cohn

Reputation: 57173

One can use llvm-obfuscator (e.g. this fork) to have transparent string encryption. Setup may be kind of painful, especially if you want to integrate this in XCode (instructions available online ^{1, 2}, but require adaptations for each new release of llvm and of XCode).

Upvotes: 0

superreeen

Reputation: 149

You can take a look at antispy C/C++ Obfuscation Library for all platforms they offer a range of obfuscation techniques.

Their string encryption will solve your problem.

Upvotes: 2

Michael Haephrati

Reputation: 4225

You can use a c++ library I have developed for that purpose. Another article which is much simpler to implement, won as the best c++ article of September 2017. For a more simple way to hide strings, see TinyObfuscate.

Upvotes: 5

mafonya

Reputation: 2180

For C check this out: https://github.com/mafonya/c_hide_strings

For C++ this:

class Alpha : public std::string
{
public:
    Alpha(string str)
    {
        std::string phrase(str.c_str(), str.length());
        this->assign(phrase);
    }
    Alpha c(char c) {
        std::string phrase(this->c_str(), this->length());
        phrase += c;
        this->assign(phrase);

        return *this;
    }
};

In order to use this, just include Alpha and:

Alpha str("");
string myStr = str.c('T').c('e').c('s').c('t');

So mystr is "Test" now and the string is hidden from strings table in binary.

Upvotes: 10

Bill

Reputation: 21

Here's a perl script to generate obfuscated c-code to hide a plaintext password from "strings" program.

  obfuscate_password("myPassword123");

  sub obfuscate_password($) {

  my $string = shift;
  my @c = split(//, $string);
  push(@c, "skip"); # Skip Null Terminator
                    # using memset to clear this byte
  # Add Decoy Characters
  for($i=0; $i < 100; $i++) {
    $ch = rand(255);
    next if ($ch == 0);
    push(@c, chr($ch));
  }                     
  my $count1 = @c;
  print "  int x1, x2, x3, x4;\n";
  print "  char password[$count1];\n";
  print "  memset(password, 0, $count1);\n";
  my $count2 = 0;
  my %dict  = ();
  while(1) {
    my $x = int(rand($count1));
    $y = obfuscate_expr($count1, $x);
    next if (defined($dict{$x}));
    $dict{$x} = 1;
    last if ($count2+1 == $count1);
    if ($c[$x] ne "skip") {
      #print "  $y\n";
      print "  $y password[x4] = (char)" . ord($c[$x]) . ";\n";
    }
    $count2++;
  }
  }

  sub obfuscate_expr($$) {
    my $count  = shift;
    my $target = shift;
    #return $target;

    while(1) {

       my $a = int(rand($count*2));
       my $b = int(rand($count*2));
       my $c = int(rand($count*2));
       next if (($a == 0) || ($b == 0) || ($c == 0));
       my $y = $a - $b;
       #print "$target: $y : $a - $b\n";
       if ($y == $target) {
          #return "$a - $b + $c";
          return "x1=$a; x2=$b; x3=$c; x4=x1-x2+x3; x5= +=x4;";
       }
    } 
  }

Upvotes: 0

Bill

Reputation: 21

create a function that assigns your password to a static char array and returns a pointer to this function. Then run this function through a obfuscation program.

If the program does a good job. it should be impossible to read your plain text password using a hex editor to examine the program binary. (at least, not without reverse engineering the assembly language. That should stop all the script kiddies armed with "strings" or hex editors, except for the criminally insane hacker that has nothing better to waste their time on.)

Upvotes: 1

banyudu

Reputation: 1092

I suggest m4.

Store you string with macros like const string sPassword = _ENCRYPT("real password");
Before build, expand macros to encrypted string with m4, so your code look like const string sPassword = "encrypted string";
Decrypt in runtime environment.

Upvotes: 0

Van Uitkon

Reputation: 356

Encrypt the encryption key with another code. Show an image of the other code to the user. Now the user has to enter the key that he sees (like a captcha, but always the same code). This makes it also impossible for other programs to predict the code. Optionally you can save a (salted) hash of the code to verify the input of the user.

Upvotes: 0

Stephen C

Reputation: 718788

Hiding passwords in your code is security by obscurity. This is harmful because makes you think you have some level of protection, when in fact you have very little. If something is worth securing, it is worth securing properly.

PS: I know that it doesn't work against real hacker, but it's much better than nothing...

Actually, in a lot of situations nothing is better than weak security. At least you know exactly where you stand. You don't need to be a "real hacker" to circumvent an embedded password ...

EDIT: Responding to this comment:

I know about pairs of keys, but it not acceptable in this case. I refactoring existing appication which uses Blowfish encryption. Encrypted data passed to server and server decrypt data. I can't change ecryption algorithm because I should provide backward compatibility.

If you care about security at all, maintaining backwards compatibility is a REALLY BAD reason to leave yourself vulnerable with embedded passwords. It is a GOOD THING to break backwards compatibility with an insecure security scheme.

It is like when the street kids discover that you leave your front door key under the mat, but you keep doing it because grandpa expects to find it there.

Upvotes: 10

Harvey

Reputation: 5821

I wonder if after first obscuring it like others have mentioned, you could embed your string in an assembly block to try and make it look like instructions. You could then have an "if 0" or "goto just_past_string_assembly" to jump over the "code" which is really hiding your string. This would probably require a bit more work to retrieve the string in code (a one-time coding cost), but it might prove to be a bit more obscure.

Upvotes: 0

Adam Liss

Reputation: 48290

As noted in the comment to pavium's answer, you have two choices:

Secure the key
Secure the decryption algorithm

Unfortunately, if you must resort to embedding both the key and the algorithm within the code, neither is truly secret, so you're left with the (far weaker) alternative of security through obscurity. In other words, as you mentioned, you need a clever way to hide either or both of them inside your executable.

Here are some options, though you need to remember that none of these is truly secure according to any cryptographic best practices, and each has its drawbacks:

Disguise your key as a string that would normally appear within the code. One example would be the format string of a printf() statement, which tends to have numbers, letters, and punctuation.
Hash some or all of the code or data segments on startup, and use that as the key. (You'll need to be a bit clever about this to ensure the key doesn't change unexpectedly!) This has a potentially desirable side-effect of verifying the hashed portion of your code each time it runs.
Generate the key at run-time from something that is unique to (and constant within) the system for example, by hashing the MAC address of a network adapter.
Create the key by choosing bytes from other data. If you have static or global data, regardless of type (int, char, etc.), take a byte from somewhere within each variable after it's initialized (to a non-zero value, of course) and before it changes.

Please let us know how you solve the problem!

Edit: You commented that you're refactoring existing code, so I'll assume you can't necessarily choose the key yourself. In that case, follow a 2-step process: Use one of the above methods to encrypt the key itself, then use that key to decrypt the users' data.

Upvotes: 48

Dmitriy

Reputation: 3415

I'm sorry for long answer.

Your answers are absolutely correct, but the question was how to hide string and do it nicely.

I did it in such way:

#include "HideString.h"

DEFINE_HIDDEN_STRING(EncryptionKey, 0x7f, ('M')('y')(' ')('s')('t')('r')('o')('n')('g')(' ')('e')('n')('c')('r')('y')('p')('t')('i')('o')('n')(' ')('k')('e')('y'))
DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

int main()
{
    std::cout << GetEncryptionKey() << std::endl;
    std::cout << GetEncryptionKey2() << std::endl;

    return 0;
}

HideString.h:

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/seq/enum.hpp>

#define CRYPT_MACRO(r, d, i, elem) ( elem ^ ( d - i ) )

#define DEFINE_HIDDEN_STRING(NAME, SEED, SEQ)\
static const char* BOOST_PP_CAT(Get, NAME)()\
{\
    static char data[] = {\
        BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)),\
        '\0'\
    };\
\
    static bool isEncrypted = true;\
    if ( isEncrypted )\
    {\
        for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)\
        {\
            data[i] = CRYPT_MACRO(_, SEED, i, data[i]);\
        }\
\
        isEncrypted = false;\
    }\
\
    return data;\
}

Most tricky line in HideString.h is:

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))

Lets me explane the line. For code:

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)

generate sequence:

( 'T'  ^ ( 0x27 - 0 ) ) ( 'e'  ^ ( 0x27 - 1 ) ) ( 's'  ^ ( 0x27 - 2 ) ) ( 't'  ^ ( 0x27 - 3 ) )

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))

generate:

'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 )

and finally,

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

generate:

static const char* GetEncryptionKey2()
{
    static char data[] = {
        'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 ),
        '\0'
    };
    static bool isEncrypted = true;
    if ( isEncrypted )
    {
        for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)
        {
            data[i] = ( data[i] ^ ( 0x27 - i ) );
        }
        isEncrypted = false;
    }
    return data;
}

data for "My strong encryption key" looks like:

0x00B0200C  32 07 5d 0f 0f 08 16 16 10 56 10 1a 10 00 08  2.]......V.....
0x00B0201B  00 1b 07 02 02 4b 01 0c 11 00 00 00 00 00 00  .....K.........

Thank you very much for your answers!

Upvotes: 57

old_timer

Reputation: 71526

I think you want to make it look like instructions, your example of

x[y++]='M'; x[y++]='y'; ...

Would do just that, the long sequence of repeated instructions with a little variation may stand out and that would be bad, the byte in question may get encoded in the instruction as is and that would be bad, so perhaps the xor method, and perhaps some other tricks to make that long section of code not stand out, some dummy function calls perhaps. Depends on your processor as well, ARM for example it is real easy to look at binary data and pick out the instructions from the data and from there (if you are looking for a default key) to possibly pick out what might be the key because it is data but is not ascii and attack that. Likewise a block of similar instructions with the immediate field varying, even if you have the compiler xor the data with a constant.

Upvotes: 0

Corin

Reputation: 2457

I was once in a similarly awkward position. I had data that needed to be in the binary but not in plain text. My solution was to encrypt the data using a very simple scheme that made it look like the rest of the program. I encrypted it by writing a program that took a string, converted all the characters to the ASCII code (padded with zeros as necessary to get a three digit number) and then added a random digit to the beginning and the end of the 3 digit code. Thus each character of the string was represented by 5 characters (all numbers) in the encrypted string. I pasted that string into the application as a constant and then when I needed to use the string, I decrypted and stored the result in a variable just long enough to do what I needed to.

So to use your example, "My strong encryption key" becomes "207719121310329211541116181145111157110071030703283101101109309926114151216611289116161056811109110470321510787101511213". Then when you need your encryption key, decode it but undoing the process.

It's certainly not bulletproof but I wasn't aiming for that.

Upvotes: 4

Ken

Reputation: 878

Post it as a code golf problem
Wait for a solution written in J
Embed a J interpreter in your app

Upvotes: 22

Wim ten Brink

Reputation: 26682

This is as secure as leaving your bike unlocked in Amsterdam, the Netherlands near Central Station. (Blink, and it's gone!)

If you're trying to add security to your application then you're doomed to fail from the start since any protection scheme will fail. All you can do is make it more complex for a hacker to find the information he needs. Still, a few tricks:

*) Make sure the string is stored as UTF-16 in your binary.

*) Add numbers and special characters to the string.

*) Use an array of 32-bits integers instead of a string! Convert each to a string and concatenate them all.

*) Use a GUID, store it as binary and convert it to a string to use.

And if you really need some pre-defined text, encrypt it and store the encrypted value in your binary. Decrypt it in runtime where the key to decrypt is one of the options I've mentioned before.

Do realize that hackers will tend to crack your application in other ways than this. Even an expert at cryptography will not be able to keep something safe. In general, the only thing that protects you is the profit a hacker can gain from hacking your code, compared to the cost of hacking it. (These costs would often be just a lot of time, but if it takes a week to hack your application and just 2 days to hack something else, something else is more likely to be attacked.)

Reply to comment: UTF-16 would be two bytes per character, thus harder to recognize for users who look at a dump of the binary, simply because there's an additional byte between every letter. You can still see the words, though. UTF-32 would even be better because it adds more space between letters. Then again, you could also compress the text a bit by changing to an 6-bit-per-character scheme. Every 4 characters would then compact to three numbers. But this would restrict you to 2x26 letters, 10 digits and perhaps the space and dot to get at 64 characters.

The use of a GUID is practical if you store the GUID in it's binary format, not it's textual format. A GUID is 16 bytes long and can be randomly generated. Thus it's difficult to guess the GUID that's used as password. But if you still need to send plain text over, a GUID could be converted to a string representation to be something like "3F2504E0-4F89-11D3-9A0C-0305E82C3301". (Or Base64-encoded as "7QDBkvCA1+B9K/U0vrQx1A==".) But users won't see any plain text in the code, just some apparently random data. Not all bytes in a GUID are random, though. There's a version number hidden in GUIDs. Using a GUID isn't the best option for cryptographic purposes, though. It's either calculated based on your MAC address or by a pseudo-random number, making it reasonable predictable. Still, it's easy to create and easy to store, convert and use. Creating something longer doesn't add more value since a hacker would just try to find other tricks to crack the security. It's just a question about how willing they are to invest more time into analyzing the binaries.

In general, the most important thing that keeps your applications safe is the number of people who are interested in it. If no one cares about your application then no one will bother to hack it either. When you're the top product with 500 million users, then your application is cracked within an hour.

Upvotes: 9

EKS

Reputation: 5623

Here is a example of what they explained, but be aware this will be fairly simply broken by anyone thats a "hacker" but will stop kiddies with a hex editor. The example i provided simply adds the value 80 and subtracks the index from it and then makes a string again. If you where planning on storing this in a binary file then there are plenty of ways to convert a string to a byte[] array.

When you have this working in your app, i would make the "math" i used a bit more complex

To make it clear, for those not understanding.... You encrypt the string before you save it so its NOT saved in clear text. If the encrypted text is never gonna change you dont even include the encrypt function in your release, you just have the decrypt one. So when you want to decrypt the string, you read the file, and then decrypt the content. Meaning your string is never gonna be stored on file in plain text format.

You can off course also have the encrypted string stored as a constants string in your application and decrypt when you need it, choose what is right for you problem depending on the size of the string and how often it changes.

string Encrypted = EncryptMystring("AAbbBb");
string Decrypted = DecryptMystring(Encrypted);

string DecryptMystring(string RawStr)
    {
        string DecryptedStr = "";
        for (int i = 0; i < RawStr.Length; i++)
        {
            DecryptedStr += (char)((int)RawStr[i] - 80 + i);
        }

        return DecryptedStr;
    }

    string EncryptMystring(string RawStr)
    {
        string EncryptedStr = "";
        for (int i = 0; i < RawStr.Length; i++)
        {
            EncryptedStr += (char)((int)RawStr[i] + 80 - i);
        }

        return EncryptedStr;
    }

Upvotes: 2

MSalters

Reputation: 179799

It's a client-server application! Don't store it in the client itself, that's the place where hackers will obviously look. Instead, add (for your new client only) an extra server function (over HTTPS) to retrieve this password. Thus this password should never hit the client disk.

As a bonus, it becomes a lot easier to fix the server later. Just send a different, per-client time-limited password every time. Don't forget to allow for longer passwords in your new client.

Upvotes: 3

MusiGenesis

Reputation: 75296

If you store the encryption key in reverse ("yek noitpyrcne gnorts yM") and then reverse it in your code (String.Reverse), this would prevent a simple search through the binary for the text of your encryption key.

To reiterate the point made by every other poster here, however, this will accomplish virtually nothing for you in terms of security.

Upvotes: 1

T.J. Crowder

Reputation: 1074266

Your example doesn't hide the string at all; the string is still presented as a series of characters in the output.

There are a variety of ways you can obfuscate strings. There's the simple substitution cypher, or you might perform a mathematical operation on each character (an XOR, for instance) where the result feeds into the next character's operation, etc., etc.

The goal would be to end up with data that doesn't look like a string, so for example if you're working in most western languages, most of your character values will be in the range 32-127 — so your goal would be for the operation to mostly put them mostly out of that range, so they don't draw attention.

Upvotes: 10

pavium

Reputation: 15118

The technology of encryption is strong enough to secure important data without hiding it in a binary file.

Or is your idea to use a binary file to disguise the fact that something is hidden?

That would be called steganography.

Upvotes: 3

sleske

Reputation: 83609

You can encode the string using some trivial encoding, e.g. xor with binary 01010101. No real protection of course, but foils the use of tools like string.

Upvotes: 2

How to hide a string in binary code?

Answers (23)

Related Questions