Reputation: 13
How it is possible to implement deletion of a key-value pair from a hash table with O(1) worst case time complexity if collision resolution is implemented using separate chaining?
I have no idea how to do it just for a constant
Upvotes: 1
Views: 563
Reputation: 186668
You can't have O(1)
in the worst case. O(1)
can only appear as an estimation on average, expected time: providing that hash function is good enough, so we don't have too many hash collisions we can guarantee O(1)
in average. In the worst case we have a different picture. In general case (when no restrictions are imposed) a well-informed adversary can
n
items which all have the same hash value.item
, we should resolve n - 1
hash collisions. In general case (no binary search etc.) we have to perform n - 1
comparisons. A well-informed adversary (who knows our algorithm) can always ask to delete the item
which can be resolved the last.So, in general case we have O(n - 1) == O(n)
time complexity for the worst case for any hash table implementation.
In your particular case (separate chaining) the worst case example is when
Upvotes: 1
Reputation: 241701
In isolation, it's easy to do deletion in O(1).
You need to modify your items to allow the possibility of a "key-tombstone" pair. The tombstone doesn't carry any information other than its presence so it could be implemented by using a bit-pattern which could not be a possible value. If there is no such bit-pattern (for example, if the value is an integer), then you can just add a bit somewhere in the key-value-chain record.
Then the operations are modified: (In the complexity analysis, we assume that the chain for a key can be identified in O(1).)
DELETE(key): insert <key, tombstone> at the beginning of the chain for key. This is clearly O(1).
INSERT(key, value): Scan the chain for key. If an entry is found, replace it with <key, value>. Optionally, continue the scan, deleting any other entries with key key. If no entry is found, add <key, value> into the chain. Whether or not the optional deletion step is taken, this is O(C) where C is the chain length, expected O(1) and worst-case O(N).
FIND(key): Scan the chain for key until an entry with key key is found or the end of the chain is reached. If no entry is found or the entry found is a tombstone, return a "key not found" indication. Otherwise, return value from the entry which was found. Like INSERT, this is O(C) where C is the chain length.
That's a bit of a cheat since while DELETE is certainly O(1), the consequence of a DELETE is reflected in longer chain lengths. (With this algorithm, there may end up being many redundant entries for a given key, so the size of the table is no longer the same as the number of distinct keys.) In amortised complexity analysis, that would be reflected in increasing the amortised cost of DELETE to account for the distributed increased cost. However, there are ways to reduce this amortised charge:
Instead of unconditionally inserting the tombstone on DELETE, you can search the first k entries in the chain, where k is some configuration constant. If the entry is found in one of the first k entries in the chain, then that entry is replaced with a tombstone, unless it already is one. For any constant k, this is still O(1), but for reasonable values of k, it avoids increasing the size of the table for delete.
Limit the cost of INSERT by also cutting off the search after k elements in the chain are examined (and not implementing the optional step of continuing the scan to remove duplicates). The result is that INSERT is also worst-case O(1), but more garbage will build up.
With a careful implementation of REHASH, the accumulated garbage can be removed, but there is a small cost. With a classic hash table, REHASH takes O(N), because it can be assumed that each key only occurs once. In that case, it is not necessary to search for a key during REHASH; each key in the old table can simply be inserted at the beginning of its chain in the new table. That's not the case with the tombstone-based deletion algorithms, because keys can be repeated; to control the garbage, it's important to deal with repeated keys.
A naïve algorithm would take worst case O(N2) to remove duplicates, but there is an obvious O(N log N) algorithm, in which the entries are stably sorted by key, making the duplicates consecutive. The unique keys can then be inserted into the new table (and only if there is an associated value) without further checking. I honestly don't know how that changes the amortised complexity of rehashing: it must be more than O(1), but it seems like it shouldn't be much more.
Upvotes: 1