Anteru
Anteru

Reputation: 19374

How to create an efficient static hash table?

I need to create small-mid sized static hash tables from it. Typically, those will have 5-100 entries. When the hash table is created, all keys hashes are known up-front (i.e. the keys are already hashes.) Currently, I create a HashMap, which is I sort the keys so I get O(log n) lookup which 3-5 lookups on average for the sizes I care. Wikipedia claims that a simple hash table with chaining will result in 3 lookups on average for a full table, so that's not yet worth the trouble for me (i.e. taking hash%n as the first entry and doing the chaining.) Given that I know all hashes up-front, it seems to be that there should be an easy way to get a fast, static perfect hash -- but I couldn't find a good pointer how. I.e. amortized O(1) access with no (little?) additional overhead. How should I implement such a static table?

Memory usage is important, so the less I need to store, the better.

Edit: Notice that it's fine if I have have to resolve one collision or so manually. I.e. if I could do some chaining which on average has direct access and worst-case 3 indirections for instance, that's fine. It's not that I need a perfect hash.

Upvotes: 7

Views: 6410

Answers (5)

Zeros-N-Ones
Zeros-N-Ones

Reputation: 989

Tried to limit my approach to focus on the main requirements of this question which are: hashes are known up-front; 5-100 entries; memory efficiency; perfect hashing not required; and occasional collisions are acceptable.

Here’s a static table implementation:

public class StaticHashTable<V> {
    private static class Entry<V> {
        final int hash;
        final V value;
        Entry<V> next;
        
        Entry(int hash, V value) {
            this.hash = hash;
            this.value = value;
        }
    }
    
    private final Entry<V>[] table;
    private final int mask;
    
    @SuppressWarnings("unchecked")
    public StaticHashTable(Map<Integer, V> entries) {
        // Find the smallest power of 2 that's at least 1.5 times the size
        // This helps reduce collisions while keeping memory usage reasonable
        int idealSize = Math.max(8, (int)(entries.size() * 1.5));
        int size = 1;
        while (size < idealSize) {
            size <<= 1;
        }
        
        this.table = new Entry[size];
        this.mask = size - 1;
        
        // First pass: count collisions per slot to optimize chains
        int[] slotCounts = new int[size];
        for (Map.Entry<Integer, V> entry : entries.entrySet()) {
            int slot = entry.getKey() & mask;
            slotCounts[slot]++;
        }
        
        // Second pass: build chains in reverse order
        // This puts most frequently accessed items at the start of chains
        for (Map.Entry<Integer, V> entry : entries.entrySet()) {
            int hash = entry.getKey();
            int slot = hash & mask;
            
            Entry<V> newEntry = new Entry<>(hash, entry.getValue());
            Entry<V> existing = table[slot];
            
            if (existing == null || slotCounts[slot] <= 1) {
                // No collision, just insert
                table[slot] = newEntry;
            } else {
                // Insert at appropriate position in chain based on hash value
                if (existing.hash > hash) {
                    // Insert at start
                    newEntry.next = existing;
                    table[slot] = newEntry;
                } else {
                    // Find insertion point in chain
                    Entry<V> prev = existing;
                    while (prev.next != null && prev.next.hash < hash) {
                        prev = prev.next;
                    }
                    newEntry.next = prev.next;
                    prev.next = newEntry;
                }
            }
        }
    }
    
    public V get(int hash) {
        Entry<V> entry = table[hash & mask];
        while (entry != null) {
            if (entry.hash == hash) {
                return entry.value;
            }
            if (entry.hash > hash) {
                return null; // Not found (chains are sorted)
            }
            entry = entry.next;
        }
        return null;
    }
}

The above implementation provides several optimisations for your use case:

Uses a power-of-2 size with masking for fast modulo operations

Smart Collision Handling:

  • Uses chaining but keeps chains sorted by hash
  • Early exit when searching chains if we pass the target hash
  • Places most commonly accessed items earlier in chains

Memory Efficiency:

  • Only stores essential fields in entries
  • Uses a table size that's the smallest power of 2 >= 1.5 * number of entries
  • No additional metadata beyond the table itself

Performance Characteristics:

  • Best case: O(1) direct access
  • Average case: Close to O(1) with very short chains
  • Worst case: O(chain length), but chains are typically very short

Here's a usage example:

Map<Integer, String> entries = Map.of(
    123, "value1",
    456, "value2",
    789, "value3"
);
StaticHashTable<String> table = new StaticHashTable<>(entries);
String value = table.get(456); // Fast lookup

Upvotes: 0

paulv
paulv

Reputation: 406

This is a well studied problem. Dan Bernstein famously used constant databases with guaranteed performance in tinydns en qmail:

CDB and deratives use a external file for the k-v content, but there is no need to do this if the number a entries is very small.

Some papers that describe the algorithms en the math:

Upvotes: 4

LsH
LsH

Reputation: 55

Small hashes are also possible in C without an external lib using the pre-processor, for example:

swich (hash_string(*p))
{
case HASH_S16("test"):
    ...
    break;
case HASH_S256("An example with a long text!!!!!!!!!!!!!!!!!"):
    ...
    break;
}

Have a look for the code @ http://www.heeden.nl/statichashc.htm

Upvotes: 4

Fredrik Pihl
Fredrik Pihl

Reputation: 45644

For c or c++ you can use gperf

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.

GNU gperf is highly customizable. There are options for generating C or C++ code, for emitting switch statements or nested ifs instead of a hash table, and for tuning the algorithm employed by gperf.

Upvotes: 5

Ted Hopp
Ted Hopp

Reputation: 234795

You can use Sux4j to generate a minimal perfect hash in Java or C++. (I'm not sure you are using Java, but you mentioned HashMap, so I'm assuming.) For C, you can use the cmph library.

Upvotes: 0

Related Questions