Dolphiniac
Dolphiniac

Reputation: 1792

How to access millions of bits for hashing

I'm doing MD5 hashing on an executable. I've used a python script to read binary from the executable into a text file, but if I were to read in this constructed file to a C program, I would be handling MBs of data, as the ones and zeroes are being treated as chars, taking 8 bits for each 1 bit number. Would it be possible to read these in as single bits each? How badly would a program perform if I made, say, a 10MB array to hold all the characters I might need for the length of the binary conversion and padding for the hash? If this is unthinkable, would there be a better way to manipulate the data?

Upvotes: 0

Views: 213

Answers (1)

user123
user123

Reputation: 9071

Since you tagged the question C and C++, I'll go for C.

Would it be possible to read these in as single bits each?

Yes, just read 8 bytes at a time from the file and concatenate those 1s and 0s to make a new byte. You don't need to make a 10MB array for this.

First, read 8 bytes from the file. The read char values would be converted to integral values (0 and 1) and then bitshifted to make a new byte.

unsigned char bits[8];
while (fread(bits, 1, 8, file) == 8) {
    for (unsigned int i = 0; i < 8; i++) {
        bits[i] -= '0';
    }

    char byte = (bits[0] << 7) | (bits[1] << 6) |
                (bits[2] << 5) | (bits[3] << 4) |
                (bits[4] << 3) | (bits[5] << 2) |
                (bits[6] << 1) | (bits[7]     );

    /* update MD5 Hash here */
}

Then, you would update your MD5 hash with the newly read byte.


Edit: Since a typical MD5 implementation would have to break the input into chunks of 512 bits before processing, you can get rid of that overhead in the implementation itself (not recommended though), and just read 512 bits (64 bytes) from the file and update the hash afterwards directly.

unsigned char buffer[64];
unsigned char bits[8];
unsigned int index = 0;

while (fread(bits, 1, 8, file) == 8) {
    for (unsigned int i = 0; i < 8; i++) {
        bits[i] -= '0';
    }

    buffer[index++] = (bits[0] << 7) | (bits[1] << 6) |
                      (bits[2] << 5) | (bits[3] << 4) |
                      (bits[4] << 3) | (bits[5] << 2) |
                      (bits[6] << 1) | (bits[7]     );

    if (index == 64) {
        index = 0;
        /* update MD5 hash with 64 byte buffer */
    }
}

/* This sends the remaining data to the MD5 hash function */
/* It's not likely that your file has exactly 512N chars */
if (index != 0) {
    while (index != 64) {
        buffer[index++] = 0;
    }
    /* update MD5 hash with the padded buffer. */
}

Upvotes: 1

Related Questions