Tristan
Tristan

Reputation: 9

Is it always necessary to implement a hash table with dynamic memory allocation?

I have searched the web on this and asked my TAs and I haven’t received a very solid answer. I am testing different data structures on performance for inserting and searching for data. One of the data structures I am testing is a hash table.

I understand that dynamically allocating memory is beneficial If you don’t know the end size of the array and want to implement array double for any number of reasons. But say we know the amount of data. In this case 40,000 integer values. Do I need to dynamically allocate an array on the heap or can I get the same functionality and use from instantiating statically with the stack? I would do everything the same, create my header file, my implementation file, and main() but not dynamically allocate memory.

Upvotes: 1

Views: 1909

Answers (3)

user128511
user128511

Reputation:

It really depends on what you mean by "dynamic allocation".

A typical hash table contains an array of buckets (or array of pointers to buckets). Each bucket holds the collection of things that hashed to the same hash. So given a collection of 40000 int you want to put in the hash table you'll need some form of dynamic alloation for each bucket collection itself since you'll have no idea how many items will end up in each bucket.

If by dynamic alloction you mean "calls to malloc" then you could manage the memory for the hash table itself, the bucket collection, each individual bucket yourself using "placement new" or some other form of your own memory management. You'd calculate the amount of memory needed for your implementation of a hash table, plus the collection of buckets, plus the worst case for memory for the maximum number of used buckets which is 40000 since worst case is one entry per bucket unless your hash function's range in less than 40000

As an example a hash table could be something like

template <typename T> class Bucket {
  int size;
  T* items;
};

template <typename T> class HashTable {
  int size;  // optional if you know your hash function's range
  Bucket<T>* buckets;
};

So you need

max_buckets = min(hash_table_range, num_items);
bytesNeeded = sizeof(HashTable<T>)
            + num_hash_values * sizeof(Bucket<T>*)
            + max_buckets * sizeof(Bucket<T>)
            + sizeof(T) * num_items;

You'd still have to manage alloation of buckets (so there is "dyanmic allocation" in a sense) and you'd need to be able to shuffle the buckets around as they grow since there'd be no room to grow them in place which seems to me would add a ton of overhead. It's just that you're managing that allocation yourself so whether that's "dynamic allocation" is up to you.

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 881403

If you know the maximum (or fixed) size that your hash table will be, you can very much statically allocate it.

The main reason for dynamic allocation is to vary the size at runtime and to be able to pass ownership of the structure from function to function. For example (and this is not tied to hash tables at all so I'll just use a generic structure):

Item *GetItem() {
    return new Item();
}

That dynamically allocates the item and passes it back to the caller (along with the ownership (responsibility for managing its lifetime)). A naive way of avoiding dynamic allocation would be:

Item *GetItem() {
    Item item;
    return &item;
}

Unfortunately, that address is unusable on return since item has gone out of scope at that point, so you could stop that from happening with:

Item *GetItem() {
    static Item item;
    return &item;
}

That preserves the object even after return from the function but has the restriction that only one copy of it ever exists. If you try to get two items from that function, they'll be the same one, and you may have bugs unless you realise that.

So, I would say that, provided you only need one hash table, you can avoid dynamic memory allocation simply by having a static copy of it.

Of course, aside from the single copy restriction, that also has the downside of requiring you to allocate the maximum space needed. That's not necessarily a bad idea (especially if it's fixed size) but it may cause inefficiencies.


In any case, you may be worrying about performance unnecessarily. You can just as easily dynamically allocate the structure in its maximum/fixed size up front (rather than doing lots of small allocations and de-allocations). Since that only really happens once per hash table (with hopefully a lot of use of the hash table before deallocating), the cost will be be relatively minor.

And it gives you back the ability to have more than one hash table in the program.

Upvotes: 1

mevets
mevets

Reputation: 10445

Good question. You can implement malloc on a system without dynamic memory allocation by simply putting a big empty table inside of malloc.c, and handling out portions of it until you ran out. Free would insert lumps back into this, so you might never run out.

A related way of allocating memory is via a frame allocator (sometimes a pool allocator); so named because it hands out access to N M-sized objects, rather than a true dynamic allocator where every object is a different size. Frame allocators are awesome because they are fast and simple -- a rare combination.

So, certainly you can work on your system without dynamically allocating, but you might need to ask yourself why? You could write it in terms of an unspecified allocator, which you could implement with a very simple frame allocator, then expand it as needed.

Upvotes: 0

Related Questions