LadyDefile
LadyDefile

Reputation: 83

C threads corrupting each other

So I've got a weird issue that I don't quite understand why it is happening. In md4checker, I launch n pthreads that get and check an MD4 hash. In md4.c, I generate an MD4 hash. If I set n threads to 1, it works flawlessly. It generates the MD4 hash with perfect accuracy (I ran it in a loop for 1,000,000 tries and not a single time did it fail). However, when I run this same code with n threads as 2 (or higher) it fails a lot and randomly.

The md4.c file is derivative of another I found online but I tweaked it a little because the original md4.c had a memory leak (and running 50,000,000+ hashes made that leak fill up 16GB of RAM in about 15 minutes). If it was just a matter of it not working, I'd know where to start but I'm genuinely at a loss as to where and why multiple threads corrupt each other here.

edit: If I add usleep(100) to the worker thread in md4checker.c, it cuts the failure rate to 10% of what it normally does.

md4checker.c (works when running just one):

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/sysinfo.h>

#include "md4.c"

struct data{
    char hash[33];
    int done;
};

void *worker(void *ptr) {
    int count=0;
    char hash[33];
    strcpy(hash, ((struct data *)ptr)->hash);
    hash[32] ='\0';
    char *md4;
    int fails =0;

    int runs =1000;
    while(count < runs) {
        md4 = MD4("cbff7", 5);
        if(strcmp(md4, hash) != 0) {
            ++fails;
        }
        free(md4);
        count++;
    }
    ((struct data *)ptr)->done = 1;
    printf("Done. Failed %d/%d times.\n", fails, runs);
}

void runprocs(int procs) {
    printf("Running process on %d thread(s)\n", procs);

    struct data d ={
        .hash = "4e0d289576880188d4b968fe626bccef\0",
        .done =0
    };

    pthread_t threads[procs];
    void *ptr =&d;

    for(int i=0; i<procs; ++i) {
        int rc = pthread_create(&threads[i], NULL, worker, ptr);
    }
    while (!d.done) {
        usleep(10000);
    }
}

int main(int argc, char *argv[]) {
    if (argc < 2) return -1;
    runprocs(1);
    runprocs(2);
    runprocs(4);
}

After running this four times the output I got was:

Run one:

Running process on 1 thread(s)
Done. Failed 0/1000 times.
Running process on 2 thread(s)
Done. Failed 490/1000 times.
Done. Failed 489/1000 times.
Running process on 4 thread(s)
Done. Failed 941/1000 times.
Done. Failed 883/1000 times.
Done. Failed 847/1000 times.
Done. Failed 473/1000 times.

Run two:

Running process on 1 thread(s)
Done. Failed 0/1000 times.
Running process on 2 thread(s)
Done. Failed 19/1000 times.
Done. Failed 17/1000 times.
Running process on 4 thread(s)
Done. Failed 953/1000 times.
Done. Failed 891/1000 times.
Done. Failed 884/1000 times.
Done. Failed 850/1000 times.

Run three:

Running process on 1 thread(s)
Done. Failed 0/1000 times.
Running process on 2 thread(s)
Done. Failed 431/1000 times.
Done. Failed 371/1000 times.
Running process on 4 thread(s)
Done. Failed 931/1000 times.
Done. Failed 928/1000 times.
Done. Failed 720/1000 times.
Done. Failed 703/1000 times.

Run four:

Running process on 1 thread(s)
Done. Failed 0/1000 times.
Running process on 2 thread(s)
Done. Failed 82/1000 times.
Done. Failed 84/1000 times.
Running process on 4 thread(s)
Done. Failed 909/1000 times.
Done. Failed 928/1000 times.
Done. Failed 790/1000 times.
Done. Failed 808/1000 times.

The first line in each set is perfect (done from main thread). Then it runs it 1,000 times in two new threads and they both print the failed/run result (as you can see in the code above). So why the random number of fails? I'm very confused here, lol. Any help would be greatly appreciated.

md4.c:

#include <stdlib.h>
#include <string.h>
#include <stdint.h>

char *MD4(char *, int); //this is the prototype you want to call. Everything else is internal.


static uint32_t *MD4Digest(uint32_t *w, int len);
static void setMD4Registers(uint32_t, uint32_t, uint32_t, uint32_t);
static uint32_t changeEndianness(uint32_t);
static void resetMD4Registers(void);
static uint32_t stringToUint32(char *);

static const char *BASE16 = "0123456789abcdef=";

#define F(X,Y,Z) (((X)&(Y))|((~(X))&(Z)))
#define G(X,Y,Z) (((X)&(Y))|((X)&(Z))|((Y)&(Z)))
#define H(X,Y,Z) ((X)^(Y)^(Z))

#define LEFTROTATE(A,N) ((A)<<(N))|((A)>>(32-(N)))

#define MD4ROUND1(a,b,c,d,x,s) a += F(b,c,d) + x; a = LEFTROTATE(a, s);
#define MD4ROUND2(a,b,c,d,x,s) a += G(b,c,d) + x + (uint32_t)0x5A827999; a = LEFTROTATE(a, s);
#define MD4ROUND3(a,b,c,d,x,s) a += H(b,c,d) + x + (uint32_t)0x6ED9EBA1; a = LEFTROTATE(a, s);

static uint32_t A = 0x67452301;
static uint32_t B = 0xefcdab89;
static uint32_t C = 0x98badcfe;
static uint32_t D = 0x10325476;

void Concat(char **out, int olen, char* second, int slen) {
    
    if(*out == NULL ) {
        *out = malloc(1);
        *out[1] = '\0';
    }

    char *old = *out;           // Grab the original string.
    //int len = (sizeof(char)*((strlen(old)+strlen(second)+1)));    // Get the length of the combined strings plus 1 for \0

    *out = malloc(olen+slen+1);     // Create the new char array to hold the combined strings.
    memset(*out, 0, olen+slen+1);   // Set all bits to zero in new array.

    char *p = *out;                 // We'll use p to track position for writing the values.
    //strcpy(p, old);                   // Copy the original string to p;
    memcpy(p, old, olen);

    p += olen;                      // Move p forward by the length of old.
    //strcpy(p, second);                // Copy the second string to p
    memcpy(p, second, slen);

    free(old);                      // Free old to prevent memory leak.
    free(second);   
}

int Expand(char **out, int amt) {
    int len = strlen(*out)+amt;         // Get the length of the array + expand amount \0

    char *new;                          // Create a new pointer.
    new = malloc(sizeof(char)*len);     // Create the new char array
    memset(new, 0, sizeof(char)*len);   // Set all bits to zero in new array.

    strcpy(new, *out);                  // Copy the original string to new array;
    free(*out);                         // Free the original memory to prevent leak
    *out = new;

    return len;                         // Return the new memory size
}

char *base16Encode(char *in, int len){
    char *out = malloc(len*2);
    int i,j;

    j=0;
    for(i=0; i<len; i++){
        out[j++]=BASE16[((in[i] & 0xF0)>>4)];
        out[j++]=BASE16[(in[i] & 0x0F)];
    }
    out[j]='\0';
    free(in);
    return out;
}

uint32_t stringToUint32(char *c){
    uint32_t l;
    int i;
    l=0;
    for(i=0; i<4; i++){
        l = l|(((uint32_t)((unsigned char)c[i]))<<(8*(3-i)));
    }
    return l;
}

char *uint32ToString(uint32_t l){
    char *c = malloc(sizeof(uint32_t)+1);
    memset(c, 0, sizeof(uint32_t)+1);

    int i;
    for(i=0; i<4; i++){
        c[i] = (l >> (8*(3-i))) & 0xFF;
    }
    return c;
}

char *MD4(char *str, int temporaryvar){

    uint64_t mlen=strlen(str);          // Get the length of str + 1 for \0
    uint64_t slen=mlen;
    char *m = malloc(mlen+1);           // Create a pointer to manipulate data and give it an array of size mlen
    strcpy(m, str);                     // Copy str to m
    m[mlen] = '\0';                     // Set the last value to 0.

    unsigned char *oneByte = malloc(sizeof(char));
    oneByte[0] = 0x80;
    Concat(&m, mlen, oneByte, 1);       // Add the 1 byte.
    int i, wlen;

    mlen=strlen(m);

    i=((56-mlen)%64);
    if(i<0) i+=64;

    mlen = Expand(&m, i);

    uint32_t *w = malloc(sizeof(uint32_t)*(mlen/4+2));

    for(i=0; i<mlen/4; i++){
        w[i]=stringToUint32(m+(4*i));
    }
    w[i++] = (slen<<3) & 0xFFFFFFFF;
    w[i++] = (slen>>29) & 0xFFFFFFFF;

    wlen=i;
    
    for(i=0; i<wlen-2; ++i){
        w[i]=changeEndianness(w[i]);
    }

    uint32_t *hash = MD4Digest(w,wlen);
    
    char *digest = malloc(1);
    memset(digest, 0, 1);
    
    //digest=newString(NULL,0);
    for(i=0; i<4; i++){
        hash[i]=changeEndianness(hash[i]);
        Concat(&digest, sizeof(uint32_t)*i,uint32ToString(hash[i]), sizeof(uint32_t));
    }


    // Don't forget to free up your memory.
    free(m);
    free(w);
    free(hash);

    return base16Encode(digest, sizeof(uint32_t)*4);
}

uint32_t *MD4Digest(uint32_t *w, int len){
    //assumes message.len is a multiple of 64 bytes.
    int i,j;
    uint32_t X[16];
    uint32_t *digest = malloc(sizeof(uint32_t)*4);
    uint32_t AA, BB, CC, DD;
    
    for(i=0; i<len/16; i++){
        for(j=0; j<16; j++){
            X[j]=w[i*16+j];
        }

        AA=A;
        BB=B;
        CC=C;
        DD=D;

        MD4ROUND1(A,B,C,D,X[0],3);
        MD4ROUND1(D,A,B,C,X[1],7);
        MD4ROUND1(C,D,A,B,X[2],11);
        MD4ROUND1(B,C,D,A,X[3],19);
        MD4ROUND1(A,B,C,D,X[4],3);
        MD4ROUND1(D,A,B,C,X[5],7);
        MD4ROUND1(C,D,A,B,X[6],11);
        MD4ROUND1(B,C,D,A,X[7],19);
        MD4ROUND1(A,B,C,D,X[8],3);
        MD4ROUND1(D,A,B,C,X[9],7);
        MD4ROUND1(C,D,A,B,X[10],11);
        MD4ROUND1(B,C,D,A,X[11],19);
        MD4ROUND1(A,B,C,D,X[12],3);
        MD4ROUND1(D,A,B,C,X[13],7);
        MD4ROUND1(C,D,A,B,X[14],11);
        MD4ROUND1(B,C,D,A,X[15],19);

        MD4ROUND2(A,B,C,D,X[0],3);
        MD4ROUND2(D,A,B,C,X[4],5);
        MD4ROUND2(C,D,A,B,X[8],9);
        MD4ROUND2(B,C,D,A,X[12],13);
        MD4ROUND2(A,B,C,D,X[1],3);
        MD4ROUND2(D,A,B,C,X[5],5);
        MD4ROUND2(C,D,A,B,X[9],9);
        MD4ROUND2(B,C,D,A,X[13],13);
        MD4ROUND2(A,B,C,D,X[2],3);
        MD4ROUND2(D,A,B,C,X[6],5);
        MD4ROUND2(C,D,A,B,X[10],9);
        MD4ROUND2(B,C,D,A,X[14],13);
        MD4ROUND2(A,B,C,D,X[3],3);
        MD4ROUND2(D,A,B,C,X[7],5);
        MD4ROUND2(C,D,A,B,X[11],9);
        MD4ROUND2(B,C,D,A,X[15],13);

        MD4ROUND3(A,B,C,D,X[0],3);
        MD4ROUND3(D,A,B,C,X[8],9);
        MD4ROUND3(C,D,A,B,X[4],11);
        MD4ROUND3(B,C,D,A,X[12],15);
        MD4ROUND3(A,B,C,D,X[2],3);
        MD4ROUND3(D,A,B,C,X[10],9);
        MD4ROUND3(C,D,A,B,X[6],11);
        MD4ROUND3(B,C,D,A,X[14],15);
        MD4ROUND3(A,B,C,D,X[1],3);
        MD4ROUND3(D,A,B,C,X[9],9);
        MD4ROUND3(C,D,A,B,X[5],11);
        MD4ROUND3(B,C,D,A,X[13],15);
        MD4ROUND3(A,B,C,D,X[3],3);
        MD4ROUND3(D,A,B,C,X[11],9);
        MD4ROUND3(C,D,A,B,X[7],11);
        MD4ROUND3(B,C,D,A,X[15],15);

        A+=AA;
        B+=BB;
        C+=CC;
        D+=DD;
    }

    digest[0]=A;
    digest[1]=B;
    digest[2]=C;
    digest[3]=D;
    resetMD4Registers();
    return digest;
}

uint32_t changeEndianness(uint32_t x){
    return ((x & 0xFF) << 24) | ((x & 0xFF00) << 8) | ((x & 0xFF0000) >> 8) | ((x & 0xFF000000) >> 24);
}

void setMD4Registers(uint32_t AA, uint32_t BB, uint32_t CC, uint32_t DD){
    A=AA;
    B=BB;
    C=CC;
    D=DD;
}

void resetMD4Registers(void){
    setMD4Registers(0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476);
}

Upvotes: 0

Views: 67

Answers (1)

John Bollinger
John Bollinger

Reputation: 180201

So why the random number of fails?

The MD4 code presented is not thread safe, and you are adding a bit of thread-unsafety of your own.

Observe in particular variables A, B, C, and D in file md4.c. These are declared at file scope and without the _Thread_local qualifier, so they have static storage duration and are shared by all threads in the process. These are modified during the computation, so you have data races involving all of these. The resulting behavior is undefined, and it shouldn't be hard to imagine how it might mess things up if multiple threads were clobbering the values that each other had written in those variables.

As for your own code, with each call to runprocs(), the main thread and each new one created all share the same struct data object, which the threads read and modify and the main thread reads, all without synchronization. This also causes undefined behavior, though it looks like this could be rescued by engaging a mutex or other synchronization mechanism.


Additionally, the MD4 code appears to be deterministic -- given the same input, it will always (if run single-threaded to avoid undefined behavior) produce the same output. It is therefore unclear what you seek to accomplish by running it in multiple threads on the same input.

Also, the while(!d.done) loop is pointless and poor form. You should be joining each thread via pthread_join() to clean up its resources after it, and since that has the (primary) effect of waiting for the thread to terminate, you don't need to also roll your own wait for termination.

Upvotes: 4

Related Questions