Matthew Chen
Matthew Chen

Reputation: 31

Memory leaks resizing a hash C

I'm trying to resize my open address hash library in C. When resizing the hash, e.g. from moving from size 5 -> size 11, I attempt to create a new hash, put all my old elements in to the new one and have my old hash point to this new hash.

The issue lies when trying to free this old hash during any resize, in resize_OPEN I can free the old hash keys, but I am not able to actually free the old_hash itself (free(old_H)) without seg faulting, so I'll always have the mem leak of the old hash. For each resize I'll leak 48 bytes of the hash object.

My put function and the resize_OPEN are where all the resizing happens.

void put(Hash *H, int cur_key, int cur_value) {
    int gen_key = 0;
    if (cur_key < 0)
        cur_key = cur_key * -1;
    gen_key = cur_key % H->cur_size;

    // Linear probing once around for a spot  
    k_v *new_node = createKV(cur_key, cur_value, 0);

    while (1) {
        // Inserting new key
        if (!H->key_value[gen_key] || (H->key_value[gen_key]->k == INT_MIN)) {
            H->num_elem++;
            H->key_value[gen_key] = new_node;
            break;
        // Overwriting key
        } else if (H->key_value[gen_key]->k == new_node->k) {
            swap(&H->key_value[gen_key], &new_node);
            free(new_node);
            return;
        }

        // Robin hood hashing

        // If the distance of the current key has probed less, swap and insert the curr key
        // now that the new key is inserted       
        if (H->key_value[gen_key]->distance < new_node->distance) {
            swap(&H->key_value[gen_key], &new_node);
            gen_key = cur_key % H->cur_size;
            // Don't increment distance until next check
            new_node->distance--;
        }
        gen_key++;
        new_node->distance++;  

        if (gen_key >= H->cur_size)
            gen_key = 0;

        // If we reach the probe limit, resize the hash
        if (new_node->distance >= H->probe_limit) {
            Hash *new_H = resize_OPEN(H);
            *H = *new_H;
            gen_key = new_node->k % H->cur_size; 
        }
    }

    if (H->num_elem >= H->to_resize) {
        if (H->type == OPEN_ADDR) {
            Hash *new_H = resize_OPEN(H);
            *H = *new_H;
        } else {
            resize(H);
        }
    }
    return; 
}

My resize function

Hash *resize_OPEN(Hash *old_H) {
    Hash *new_H = createHash(2 * old_H->cur_size);

    for (int i = 0; i < old_H->cur_size; i++) {
        if (old_H->key_value[i] && old_H->key_value[i]->k != INT_MIN) {
            int k = old_H->key_value[i]->k;
            int v = old_H->key_value[i]->v;
            put(new_H, k, v);
            free(old_H->key_value[i]);
        }
    }

    free(old_H->key_value);
    // How do I free the old hash here? free(old_H) will seg fault
    return new_H;
}

EDIT: I attempted to explain better, sorry for the confusion.

typedef struct k_v {
    int k;
    int v;
    int distance;
} k_v;

typedef struct Hash {
    int cur_size;
    int num_elem;      
    k_v **key_value;       // Open addressing           
    int to_resize;         // Elements to resize 
    int probe_limit;
    double load_factor; 
} Hash;

A memory leak example of when I insert 5 elements into a size 5 hash (one resize occuring)

Hash *createHash(int starting_size) {           
    // Get the next prime size
    int index = 0;
    for (; index < PRIME; index++) {
        if (prime_size[index] >= starting_size) {
            starting_size = prime_size[index];
            break;
        }
    }

    Hash *new_hash = (Hash *)malloc(SIZE_hash);

    new_hash->key_value = (k_v **)calloc(starting_size, SIZE_kv);
    new_hash->probe_limit = log_prime[index];
    new_hash->cur_size = starting_size;
    new_hash->num_elem = 0;
    new_hash->load_factor = DEFAULT_LF;  // Default load factor (change at 0.75 = N / size)

    new_hash->to_resize = resize_default[index];

    return new_hash;
}

Upvotes: 0

Views: 155

Answers (1)

chqrlie
chqrlie

Reputation: 144780

There is a memory leak in function resize_OPEN():

  • Since you were not passed the address of the H variable in the topmost calling context, resize_OPEN() should update the Hash structure *old_H by copying the contents of the new structure and free the newly allocated structure.

  • Note however that this method is rather disturbing and error prone, and it reallocates all key/value nodes. You could just allocate a larger array and redispatch the key/value nodes along the new hash size, and free the old table, thus removing the need for extra allocations and the uncanny struct copy.

  • Memory allocation failures are not tested. I suggest you use wrappers on malloc() and calloc() to test for failure and abort with an error message upon failure. This malloc() or die approach is better than undefined behavior. Such a wrappers are usually named xmalloc(), xcalloc()...

Here is a simple fix:

// resize_OPEN reallocates the hash table to a larger size
// memory allocation failures are not tested.
void resize_OPEN(Hash *old_H) {
    Hash *new_H = createHash(2 * old_H->cur_size);

    for (int i = 0; i < old_H->cur_size; i++) {
        if (old_H->key_value[i] && old_H->key_value[i]->k != INT_MIN) {
            int k = old_H->key_value[i]->k;
            int v = old_H->key_value[i]->v;
            put(new_H, k, v);
            free(old_H->key_value[i]);
        }
    }

    free(old_H->key_value);
    *old_H = *new_H;
    free(new_H);
}

Call this function this way from the put function:

        ...
        // If we reach the probe limit, resize the hash
        if (new_node->distance >= H->probe_limit) {
            resize_OPEN(H);
            gen_key = new_node->k % H->cur_size; 
        }
    }

    if (H->num_elem >= H->to_resize) {
        if (H->type == OPEN_ADDR) {
            resize_OPEN(H);
        } else {
            resize(H);
        }
    }
    ...

There might be other problems in your code, you posted most needed information but not a form that can be easily compiled and tested. Very few programmers can spot complex problems from reading code fragments alone. Furthermore some problems may hide in definitions and code elsewhere, so it is often impossible to pose the correct diagnostic without a Minimal, Complete, and Verifiable Example.

Upvotes: 1

Related Questions