Robert
Robert

Reputation: 2389

Fast map implementation in C++

I have a working implementation now that maps keys of ranges, like so:

class Range {
public:
    Range(int from, int to = -1) : _from(from), _to( to >= 0 ? to : from) {}
    bool operator < (const Range& item) {
        return _to < item._from;
    }
    bool operator == (const Range& item) {
        return item._from >= _from && item._to <= _to;
    }
private:
    int _from, _to;
};

typedef std::map<Range, MappedType> my_map_type;

Cool thing with this is that I can do:

my_map_type m;
m[Range(0, 20)] = Item1;
m[Range(30,40)] = Item2;

my_map_type::iterator it = m.find(15);
assert(it->second == Item1);
it = m.find(40);
assert(it->second == Item2);
it = m_find(25);
assert(it == m.end());

But I need a faster map implementation than std::map. Insertions are OK to be slow, but finds must be really quick. Tried boost::unordered_map but I cannot get it to work with Range class (although I've implemented boost::hash_value for Range objects). A find returns nothing (and operator== isn't even called during a find, which I find odd)

Ideas ?

Upvotes: 2

Views: 1975

Answers (3)

Giuseppe Ottaviano
Giuseppe Ottaviano

Reputation: 4633

You can't do this with an hash table, your definition of operator== can't be compatible with an hash function: In your code Range(10, 20) == Range(15, -1) but there is no way an hash function can return the same hash.

In general, hash and equality must be compatible: x == y must imply hash(x) == hash(y). Of course the converse is not true.

So you need some comparison-based structure, like the tree-based map. Instead of using a broken operator== which could give you problems, you can define a correct equality comparator and use map::lower_bound that does exactly what you are trying to do.

If it is too slow for you, you can use a sorted vector and use std::lower_bound. Search time is O(log n) which is asymptotically the same as std::map but much faster in practice (no pointer chasing, better locality). However it has linear update (insertion/deletion) time.

Otherwise you may want to have a look at specialized structures, like interval trees, but they are not implemented in the STL (maybe Boost?).

Anyway, the implicit Range(int) constructor is misleading and potentially harmful. You should declare it as explicit and use, for example, find(Range(40)) in place of find(40).

Upvotes: 5

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136525

You are using std::map<> which is normally implemented as a red-black tree. boost::multi_index container also offers red-black tree based container but with compressed nodes (smaller by the size of a pointer), so it is going to be faster than std::map<> due to smaller working set.

Another option is to use a hash, so that some of you lookups when there are no hash collisions will be O(1).

Upvotes: 0

Victor Nicollet
Victor Nicollet

Reputation: 24587

What you're trying to do will not work, whether with std::map or any other container. Your operator < and operator == do not match the traditional requirements for these operators:

  • If a == b and a == c, then b == c : in your situation, [1,2] == [0,100] and [98,99] == [0,100] but obviously [1,2] != [98,99].
  • If a < b is true then b < a is false : in your situation, [2,4] < [1,3] is true but [1,3] < [2,4] is also true.

So, your std::map implementation will also fail in some situations.

Not only that, but std::map will only return one range while it could be expected that a given item could be within several ranges in the map.

If you can safely assume that none of the ranges overlap, then sort the ranges based on their from value alone, use upper_bound to extract the range with the larges from smaller than the value you are looking for, and compare with that range's to to determine if it's an actual match.

Upvotes: 1

Related Questions