Reputation: 373
So I'm really rusty with my C++, and I'm trying to implement a compact mapping container as an exercise. I'm trying to figure out the best way to let my class choose the best/smallest type for a dynamically allocated array. Further this type may change during the lifetime of the object.
The idea is that you break up a hash table into the table and the data:
table: [-1, -1, 2, -1, 1, 0]
data: [ [h0, key0, value0], [h1, key1, value1], [h2, key2, value2] ]
When preforming lookups, you hash the key % table size, and get the corresponding index in the data array. By keeping the actual hash table sparse and as small and light as possible, you can get some good caching speed etc. Also, the elements can be iterated over in order.
To keep the table small, I want it to use the smallest possible datatype to hold the indexes (with headroom to minimize collisions):
As more entries are added, the table array will need to resize and eventually change type. I'm trying to figure out the best way to do this in my container.
Since I'm going to have to be using the table array type in my definitions, I'm going to have to use some kind of polymorphism. I don't think a template will work, since type can change and won't be known at runtime.
I've read a little about unions and variants, but from what I understand of them, I don't think they'll work.
I know a bit of C, but I know that using void pointers is frowned upon in C++.
The best I've come up with is some kind of base class to tell my container that the table
arrays all support the same interface. But it seems like I'd be duplicating a lot of code and inserting some virtual function lookups for something I want to keep simple and fast as possible.
template <typename K, typename V>
struct Entry {
int hash;
V value;
K key;
};
class Table {
public:
virtual int operator[](int) =0;
}
class CharTable: public Table {
public:
CharTable(int s) :t{new char[s]}{}
int operator[](int i) { return t[i]; }
~CharTable() {delete t[];}
private:
char* t;
}
// short table etc...
template <typename K, typename V>
class CompactMapping {
public:
CompactMapping();
V& operator[](const K&);
unsigned int size() const {return sz;}
void resize(unsigned int);
private:
vector<Entry<K,V>> entries;
unsigned int sz;
Table* table;
int allocated;
}
template <typename K, typename V>
V& CompactMapping<K, V>::operator[](const K& key){
//simplified
int index = table[hash(key) % allocated];
if (entries[index].key == key){
return entries[index].value;
}
}
template <typename K, typename V>
void CompactMapping<K, V>::resize(unsigned int s){
if (s <= 2**7)
CharTable* t = new CharTable(s);
if (s <= 2**15)
ShortTable* t = new ShortTable(s);
//... maybe a switch statement instead
for (int i=0; i!=sz; ++i)
t[entries[i].hash %s] = i;
delete *table;
table = t;
allocated = s;
}
Full disclosure I haven't actually tested this, so the implementation might be off. I just want to know before I go down this road if my thinking is okay or if there's a better solution.
I'd also appreciate any other advice you guys could give me.
Upvotes: 0
Views: 194
Reputation: 119877
class CharTable: public Table
You probably want this instead:
template <class Index> class TypedTable : publlic Table { ... };
using CharTable = TypedTable<unsigned char>; // etc
This eliminates code duplication.
Now using a virtual call won't make the implementation win any speed competition, but you should profile first. If there's a significant bottleneck at the virtual call, try making your own dispatch mechanism using e.g. a switch
statement and see if it helps. Using a void
pointer is relatively benign as it is confined to a tightly controlled piece of code. Or you can use std::variant
or your own implementation of a tagged union.
Upvotes: 1