intagli
intagli

Reputation: 310

Extendible Hashing, doubling the size of an array of pointers

I'm trying to implement Extendible Hashing in C++

There's a struct which acts as an Index and it contains an array of type 'Bucket'

Bucket * bucket_pointers;

There's another struct, Bucket, which has an array, which holds my values

E values[N] = {};

I've got a more or less working program, with one problem: Everytime I to double the size of my hash table, I'm copying all of my buckets into a new array (twice the size)

Index_0
Bucket <n= 3, local_depth=2, 0x100200000>
[12,4,,8,]
Index_1
Bucket <n= 0, local_depth=1, 0x100200028>
[,,,,]
Index_2
Bucket <n= 3, local_depth=2, 0x100200050>
[2,10,6,,]
Index_3
Bucket <n= 0, local_depth=1, 0x100200078>
[,,,,]

However, the Bucket with address 0x100200078 should actually point to the bucket with address 0x100200028, i.e. both indices (1 and 3) should point to the same bucket.

Here I'm deciding whether to split a bucket or double the size of my index...

while (!bucket_pointers[h%index_size].append(e)){ 
    if(bucket_pointers[h%index_size].local_depth<global_depth){
        split(hashValue);
    }
    else if(bucket_pointers[h%index_size].local_depth==global_depth){
        resize();
    }
}

I'm currently doubling the size of my array like this:

for (size_t i = 0; i < index_size;  ++i){
            for (size_t j = 0; j < bucket_pointers[i].n;  ++j){ 
                newBucket_pointers[i] = bucket_pointers[i];
                newBucket_pointers[i+index_size] = bucket_pointers[i];
            }
    }

Upvotes: 0

Views: 2670

Answers (3)

Chilly_Vanilly
Chilly_Vanilly

Reputation: 75

As @user2079303 already pointed out, what you want is an array of Bucket**.

Let me clarify this with some imagery:

Extendible-hashing explained

One thing to remember in case Bucket** index = new Bucket*[<size_here>] confuses you, say you want to make a simple int-array. You would do:

int* nums = new int[5];

Simply imagine to decrease the number of *-symbols on the right-side since that is defining what the content-type shall be. And so all you want to store is addresses to Buckets. Thus the index containing 1 or more pointer to Buckets.

Hope it helps!

Upvotes: 1

eerorika
eerorika

Reputation: 238401

Note that Bucket * bucket_pointers; is not an array of Bucket pointers as it's name would imply. It's a pointer to a Bucket (the first Bucket in an array of Buckets to be specific).

So, when you copy the array of buckets to another, you end up with identical copies of buckets each with their own values arrays.

newBucket_pointers[i] = bucket_pointers[i];
newBucket_pointers[i+index_size] = bucket_pointers[i];

If you want newBucket_pointers[i] and newBucket_pointers[i+index_size] to be pointers that point to the same Bucket then the type of bucket_pointers (and newBucket_pointers) should actually be Bucket**. Then bucket_pointers is a pointer to a Bucket* and bucket_pointers[i] is a pointer to a Bucket. That way bucket_pointers[i], newBucket_pointers[i] and newBucket_pointers[i+index_size] would point to the same Bucket. I recommend a std::vector<Bucket*> bucket_pointers instead though for easier memory management.

If instead, you intend to copy the Buckets as you do now but have their values member point to a shared array, then you can keep bucket_pointers as it is and you need to change the type of values to a pointer and allocate the array separately. If you want to share the array this way, you should probably use a shared_ptr to make the eventual deallocation easier.

Upvotes: 2

Michael J
Michael J

Reputation: 7939

I've included some code below that performs as a very simple hash table. It is for instructional purpose only and not robust enough for use in a real application. In real life use the built-in std::unordered_set which works much better.

I avoid the need to change the bucket size, by using a linked list as a bucket that can expand as needed.

Is this example helpful to set you on the right track?

#include <iostream>
#include <array>
#include <list>
#include <string>
#include <cassert>


class CTable
{
public:
    void Add(const std::string &sKey, int nVal);
    int  Find(const std::string &sKey);

protected:
    size_t Index(const std::string &sKey);

private:
    struct SData
    {
        SData(const std::string &s, int n)
        : sKey(s)
        , nVal(n)
        {
        }
        std::string sKey;
        int         nVal;
    };
    typedef std::list<SData> Bucket_t;
    enum { nBuckets = 24 };
    typedef std::array<Bucket_t, nBuckets> Table_t;
    Table_t m_table;

    const SData *Lookup(const Bucket_t &b, const std::string &sKey);
};

void CTable::Add(const std::string &sKey, int nVal)
{
    size_t nIndex = Index(sKey);
    const SData *p = Lookup(m_table.at(nIndex), sKey);
    if (p)
        throw std::runtime_error("duplicate key");
    m_table.at(nIndex).push_back(SData(sKey, nVal));
}

int CTable::Find(const std::string &sKey)
{
    size_t nIndex = Index(sKey);
    const SData *p = Lookup(m_table.at(nIndex), sKey);
    if (p)
        return p->nVal;
    else
        throw std::runtime_error("not found");
}

size_t CTable::Index(const std::string &sKey)
{
    return std::hash<std::string>()(sKey) % m_table.size();
}

const CTable::SData *CTable::Lookup(const CTable::Bucket_t &b, 
                                    const std::string &sKey)
{
    for (const SData &s : b)
        if (s.sKey == sKey)
            return &s;
    return nullptr;
}


int main() 
{
    CTable t;

    t.Add("one", 1);
    t.Add("two", 2);
    t.Add("three", 3);

    assert(2 == t.Find("two"));

    try
    {
        t.Find("four");
        assert(false);
    }
    catch (std::exception &)
    {
    }
    try
    {
        t.Add("two", 3);
        assert(false);
    }
    catch (std::exception &)
    {
    }
    return 0;
}

Upvotes: 1

Related Questions