Explicitly removing sensitive data from memory?

Question

The recent leak from Wikileaks has the CIA doing the following:

DO explicitly remove sensitive data (encryption keys, raw collection data, shellcode, uploaded modules, etc) from memory as soon as the data is no longer needed in plain-text form.

DO NOT RELY ON THE OPERATING SYSTEM TO DO THIS UPON TERMINATION OF EXECUTION.

Me being a developer in the *nix world; I'm seeing this as merely changing the value of a variable (ensuring I do not pass by value; and instead by reference); so if it's a string thats 100 characters; writing 0's thats 101 characters. Is it really this simple? If not, why and what should be done instead?

Note: There are similar question that asked this; but it's in the C# and Windows world. So, I do not consider this question a duplicate.

LSerni · Accepted Answer

Me being a developer in the *nix world; I'm seeing this as merely changing the value of a variable (ensuring I do not pass by value; and instead by reference); so if it's a string thats 100 characters; writing 0's thats 101 characters. Is it really this simple? If not, why and what should be done instead?

It should be this simple. The devil is in the details.

memory allocation functions, such as realloc, are not guaranteed to leave memory alone (you should not rely on their doing it one way or the other - see also this question). If you allocate 1K of memory, then realloc it to 10K, your original K might still be there somewhere else, containing its sensitive payload. It might then be allocated by another insensitive variable or buffer, or not, and through the new variable, it might be possible to access a part or all of the old content, much as it happened with slack space on some filesystems.
manually zeroing memory (and, with most compilers, bzero and memset count as manual loops) might be blithely optimized out, especially if you're zeroing a local variable ("bug" - actually a feature, with workaround).
some functions might leave "traces" in local buffers or in memory they allocate and deallocate.
in some languages and frameworks, whole portions of data could end up being moved around (e.g. during so-called "garbage collection", as noticed by @gene). You may be able to tell the GC not to process your sensitive area or otherwise "pin" it to that effect, and if so, must do so. Otherwise, data might end up in multiple, partial copies.
information might have come through and left traces you're not aware of (trivial example: a password sent through the network might linger in the network library read buffer).
live memory might be swapped out to disk.

Example of realloc doing its thing. Memory gets partly rewritten, and with some libraries this will only "work" if "a" is not the only allocated area (so you need to also declare c and allocate something immediately after a, so that a is not the last object and left free to grow):

int main() {
        char *a;
        char *b;
        a = malloc(1024);
        strcpy(a, "Hello");
        strcpy(a + 200, "world");
        printf("a at %08ld is %s...%s
", a, a, a + 200);
        b = realloc(a, 10240);
        strcpy(b, "Hey!");
        printf("a at %08ld is %s...%s, b at %08ld is %s
", a, a, a + 200, b, b);
        return 0;
}

Output:

 a at 19828752 is Hello...world
 a at 19828752 is 8????...world, b at 19830832 is Hey!

So the memory at address a was partly rewritten - "Hello" is lost, "world" is still there (as well as at b + 200).

So you need to handle reallocations of sensitive areas yourself; better yet, pre-allocate it all at program startup. Then, tell the OS that a sensitive area of memory must never be swapped to disk. Then you need to zero it in such a way that the compiler can't interfere. And you need to use a low-level enough language that you're sure doesn't do things by itself: a simple string concatenation could spawn two or three copies of the data - I'm fairly certain it happened in PHP 5.2.

Ages ago I wrote myself a small library - there wasn't valgrind yet - inspired by Steve Maguire's Writing Solid Code, and apart from overriding the various memory and string functions, I ended up overwriting memory and then calculating the checksum of the overwritten buffer. This not for security, I used it to track buffer over/under flows, double frees, use of freed memory -- this kind of things.

And then you need to ensure your failsafes work - for example, what happens if the program aborts? Might it be possible to make it abort on purpose?

You need to implement defense in depth, and always look at ways to keep as little information around as possible - for example clearing the intermediate buffers during a calculation rather than waiting and freeing the whole lot in one fell swoop at the very end, or just when exiting the program; keeping hashes instead of passwords when at all possible; and so on.

Of course all this depends on how sensitive the information is and what the attack surface is likely to be (mandatory xkcd reference: here). Rebooting the PC with a memtest86 image could be a viable alternative. Think of a dual-boot computer with memtest86 set to test memory and power down the PC as default boot option. When you want to turn off the system... you reboot it instead. The PC will reboot, enter memtest86 by default, and before powering off for good, it'll start filling all available RAM with marching troops of zeros and ones. Good luck freeze-booting information from that.

Explicitly removing sensitive data from memory?

Answers (2)

Related Questions