Reputation: 1066
Is there an efficient implementation of a hash table, which maps key (integer) to values (string) and vice versa, for some compiled language?
Of course, one could always have two tables, one for key=>value mapping and other for value=>key. However that would not be very efficient, at least not memory-wise. Possibly both mappings can be in a single table, if the type system and intended usage allow it.
Upvotes: 2
Views: 2844
Reputation: 44250
This is the datastructureI usefor a bihash: The overhead is four ints (for the indexes) per entry.
In the example below, with typedef unsigned char Index;
, the overhead will be four characters, and the maximal table capacity will be 255.
/* For demonstration purposes these types are VERY SMALL.
** Normally one would use unsigned short, or unsigned int.
*/
typedef unsigned char Index;
typedef unsigned short Hashval;
/* Use the maximal representable value as sentinel value */
#define NIL ((Index)-1)
struct entry {
Index head_key /* The head-of-the-chain pointers */
, head_val; /* ... and for the values */
Index next_key /* linked list for chaining the keys */
, next_val; /* ... and for the values */
};
struct table {
unsigned totlen, keylen;
/* free points to the root of the freetree */
Index size, free;
/* The complete payload, for both keys and values.
* layout = [key0|val0|key1|val1|..] (without padding/alignment)
*/
char *data;
struct entry *entries; /* All the entries. Not pointers, but the actual entries. */
};
/* Macros for accessing the pool of payload */
#define NODE_KEY(p,n) ((p)->data + (n) * (p)->totlen)
#define NODE_VAL(p,n) ((p)->data + (n) * ((p)->totlen+(p)->keylen))
#define TH_OK 0
#define TH_KEY_NOT_FOUND 1
#define TH_VAL_NOT_FOUND 2
#define TH_BOTH_NOT_FOUND 3
#define TH_TABLE_FULL 4
#define TH_KEY_DUPLICATE 5
#define TH_VAL_DUPLICATE 6
#define TH_BOTH_DUPLICATE 7
#define TH_TOTAL_ECLIPSE 8
/********************************************/
/* Allocate and initialise the hash table.
** Note: given fixed size, the table and the payload could be statically allocated,
** (but we'd still need to do the initialisation)
*/
struct table * table_new( unsigned keylen, unsigned vallen, unsigned totcount )
{
Index idx;
struct table *this;
if (totcount > NIL) {
fprintf(stderr, "Table_new(%zu,%zu,%zu): totcount(%zu) larger than largest Index(%zu)\n"
, (size_t) keylen, (size_t) vallen, (size_t) totcount
, (size_t) totcount, ((size_t)NIL) -1 );
return NULL;
}
this = malloc (sizeof *this);
this->size = totcount;
this->keylen = keylen;
this->totlen = keylen+vallen;
this->data = malloc (totcount * this->totlen );
this->entries = malloc (totcount * sizeof *this->entries );
this->free = 0; /* start of freelist */
for( idx=0; idx < this->size; idx++ ) {
this->entries[idx].head_key = NIL;
this->entries[idx].head_val = NIL;
this->entries[idx].next_key = NIL;
this->entries[idx].next_val = idx+1; /* unused next pointer reused as freelist */
};
this-> entries[idx-1].next_val = NIL; /* end of freelist */
fprintf(stderr, "Table_new(%zu,%zu,%zu) size = %zu+%zu+%zu\n"
, (size_t) keylen, (size_t) vallen, (size_t) totcount
, sizeof *this, (size_t)totcount * this->totlen, totcount * sizeof *this->entries
);
return this;
}
Upvotes: 0
Reputation: 7576
One name for this is a BiMap (as in bidirectional). The obvious limitation is that keys will be distinct (like in a normal dictionary/map), but so will with values.
For Java, there's a StackOverflow question on it, but the general recommendation is the Guava BiMap.
For C and C++, Boost has a Bimap.
Internally, it's the "inefficient" implementation you mention where it keeps two hashtables. Here's the thing: it is efficient, and using twice as much memory for a secondary lookup structure is expected, and rarely a big deal.
Upvotes: 4