Reputation:
Is it possible using a certain hash function and method (the division method, or double hashing) to make a chained hash table that can be resized without having to reinsert (rehash) each element already in the table?
Upvotes: 3
Views: 1117
Reputation: 18316
I assume you're asking this question because you want to avoid the high cost of resizing a hash table. You want a hash table which has guaranteed constant time (assuming no collision problems, of course). This can be done.
The trick is to iteratively initialize the next-size hash table while the current one is filling up. By the time you need it, it's ready.
Quick pseudo-code to add an element:
if resizing then
smallTable = bigTable
bigTable = new T[smallTable.length * 2] //if allocation zeroes memory, we lose O(1)
set state to zeroing
elseif zeroing then
zero a small amount of the bigTable memory
if done zeroing then set state to transfering
elseif transfering then
transfer a few values in the small table to the big table
if done transfering then set state to resizing
end if
add new item to small array
add new item to large array
Upvotes: 0
Reputation: 48516
You would still need to reinsert, but some way to make that cheaper would be to store the hash value before the modulus was applied. That way, you can save a large part of the calculation cost of rehashing.
With this approach, it would be possible to shrink the table in size as well.
Upvotes: 2
Reputation: 36476
I can only assume the reason you want to avoid rehashing everything is that the resulting high latency operation is not an issue to throughput but is instead a problem for responsiveness (either human or in SLA sense)
In theory you could use a modified closed addressing hash table like so:
Downgrades in size of the hash will only function in the desired manner (releasing memory) if you are willing to rehash. This is unavoidable.
It is possible you could make use of some complex additional data within an open addressing scheme to 'flag' which of the internal hashes the cell was in use by but removals would be extremely complex to get right and would be very expensive unless you just left them as wasted space. I would never attempt this.
I would not suggest using the former method either unless the underlying data spent very little time in the hash, thus the related churn would tend to steadily 'erase' the older sized hashes. It is likely that a hash tuned for just this sort of behaviour and preset with an appropriate size would perform much better though.
Since the above scheme is simply trading wasted memory and throughput for reduction in the expensive operations with speculative (at best) chance of reducing this waste I would suggest simply pre-sizing your hash to be larger than required and thus never resized would be a more sensible option.
Upvotes: 1
Reputation: 46830
All hash tables must deal with collisions, either through chaining or probing or whatever, so, I suspect that if upon table resize you simply resized the table (IE, you don't re-insert everything), you would have a functional, though highly non optimal, hash table.
Upvotes: 0
Reputation: 3349
Probably not - the hash would have to not use any variety of modulus, which would mean that it would have a required table size depending on the data anyway.
Upvotes: 1