Marek
Marek

Reputation: 87

How to implement a hash table with 2 keys?

I have a following problem: a single value, which can be associated with two different keys. Example:

So a query can be twofold:

table.find(uint64 key1) or
table.find(uint32 key2)

key1 and key2 are completely independent. Is there a possibility to implement ONE table having access through two keys WITHOUT duplicating the items?

One possible solution (psedocode):

class TwoKeyHashTable {
  Value find(uint64);
  Value find(uint32);

  insert(key1, key2, value) {
    insert_into_table(key1, value);
    insert_into_table(key2, value);
  }

  struct Item {
    uint64 key1;
    uint32 key2;
    Value value;
  } *table;
};

However this solutions doubles the number of items in the table. I have hundreds of millions of items and I want to keep the entire table in the memory, so I am asking if something more memory efficient exists?

Upvotes: 1

Views: 1640

Answers (1)

Marek
Marek

Reputation: 87

Wow, I am surprised that there are no ideas around... :-/ So I implemented the table, with duplicating of all items, as follows:

class TwoKeysHashTable {
 public:
  struct Item {
    uint64 key1;
    uint32 key2;
    int32 value;

    Item() : key1(0), key2(0) { }

    Item(uint64 k1, uint32 k2, int val)
       : key1(k1), key2(k2), value(val) { }
  };

  ...

  Item *Find(uint64 key) const;
  Item *Find(uint32 key) const;
  int Insert(uint64 key1, uint32 key2, int value);

 private:
  Item *FindKey1(uint64 key, int *return_index) const;
  Item *FindKey2(uint32 key, int *return_index) const;

  int GetNextIndex(int start_index, int counter) const;
  void Rehash();

  Item *array_;
  int number_key1_;
  int number_key2_;
  int allocated_items_;
};

In order not to duplicate the (real) data, it is stored in a compressed array, and 'Item::value' is the index to that array. 'Insert' calls both 'FindKey1' and 'FindKey2' to query if keys are already in the table and if not, new Items will be inserted at the returned indexes. I used open hashing to keep the array is compact as possible. Despite all the effort, the table is still using over 8GB of memory for my data (and I am not counting the actual data, the 'value' is pointing to).

Any ideas how to do it even more memory efficiently? Thanks...

Upvotes: 2

Related Questions