Daniel Valland
Daniel Valland

Reputation: 1107

Extendible Hashing Repointing Pointers After Directory Doubling

I need to understand the rules of re-pointing the pointers once the directory has doubled. Each bucket will have two times its current pointers, but my question is how to determine which of the directory entries to point to each of the buckets? Note: this is not a question of re-pointing the pointers after a bucket split in which local depth is lower than the current amount of bits used in the directory, in which the bit at local depth+1 would determine this.

Upvotes: 3

Views: 248

Answers (1)

creme332
creme332

Reputation: 1945

Let's consider an example where each bucket has a size of 4 and the least significant bits are used in directory.

But before, you have to understand the following terms:

  • Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to. Since we are using least significant bits in the directory in our example, global depth = number of least significant bits needed to identify bucket.
  • Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.

After inserting some values, suppose the state of the directory and buckets is as follows:

enter image description here

The hash values (indicated with an asterisk) are shown inside the buckets (instead of the data) to facilitate understanding. In practice, the data instead of the hash value is placed in a bucket.

Now let's insert an entry with hash value 20. The last two bits of the binary value of 20 is 00 so 20* should have been inserted in Bucket A. However since Bucket A is full, this is not possible.

Doubling directory

To deal with this, we must increase global depth by 1. So the new global depth becomes 3 which means that we now have to consider the last 3 bits of a hash value to determine its location.

32* and 16* are kept in the same bucket because their last 3 bits are identical. (Same reason for putting 4*, 12*, and 20* in the same bucket.)

enter image description here

Note

  • If most significant bits (instead of least significant bits) were used in directory, then when doubling, you will have to consider the most significant bits to determine the location of the hash.
  • Directory is doubled by copying it over and adjusting pointer to split image page (use of least significant bits enables efficient doubling via copying of directory)

References

https://cse.buffalo.edu/~zzhao35/teaching/cse562_spring22/files/08-hashindex.pdf

https://www2.cs.sfu.ca/CourseCentral/354/lxwu/notes/chapter11.pdf

Upvotes: 0

Related Questions