baldFemale
baldFemale

Reputation: 11

Create new hash table from existing hash table

Suppose we have a hash table with 2^16 keys and values. Each key can be represented as a bit string (e.g., 0000, 0000, 0000, 0000). Now we want to construct a new hash table. The key of new hash table is still a bit string (e.g., 0000, ****, ****, ****). The corresponding value would be the average of all values in the old hash table when * takes 0 or 1. For instance, the value of 0000, ****, ****, **** will be the average of 2^12 values in the old hash table from 0000, 0000, 0000, 0000 to 0000, 1111, 1111, 1111. Intuitively, we need to do C(16, 4) * 2^16 times to construct the new hash table. What's the most efficient way to construct the new hash table?

Upvotes: 0

Views: 133

Answers (1)

rici
rici

Reputation: 241861

The hash table here is not helping you at all, although it isn't much of a hindrance either.

Hash tables cannot, by their nature, cluster keys by the key prefix. In order to provide good hash distribution, keys need to be distributed as close to uniformly as possible between hash values.

If you will need later to process keys in some specific ordering, you might consider an ordered associative mapping, such as a balanced binary tree or some variant of a trie. On the other hand, the advantage of processing keys in order needs to be demonstrated in order to justify the additional overhead of ordered mapping.

In this case, every key needs to be visited, which means the ordered mapping and the hash mapping will both be O(n), assuming linear time traverse and constant time processing, both reasonable assumptions. However, during the processing each result value needs two accumulated intermediaries, basically a running total and a count. (There is an algorithm for "on-line" computation of the mean of a series, but it also requires two intermediate values, running mean and count. So although it has advantages, reducing storage requirements isn't one of them.)

You can use the output hash table to store one of the intermediate values for each output value, but you need somewhere to put the other one. That might be another hash table of the same size, or something similar; in any case, there is an additional storage cost

If you could traverse the original hash table in prefix order, you could reduce this storage cost to a constant, since the two temporary values can be recycled every time you reach a new prefix. So that's a savings, but I doubt whether it's sufficient to justify the overhead of an ordered associative mapping, which also includes increased storage requirements.

Upvotes: 1

Related Questions