Blubber
Blubber

Reputation: 1483

How to keep track of visited points in C++

I am doing a problem in c++ that has to keep track of points that are visited in a traversal. The point is basically,

struct Point {
  int x;
  int y;
};

My first thought to solving something like this would be to use something like

std::set<Point> visited_points;

or maybe

std::map<Point, bool> visited_points;

However, I am a beginner in c++, and I realized you have to implement a Compare, which I didn't know how to do. When I asked, I was told said that using a map was "overkill" in a problem like this. He said the better solution was to do something like

std::vector<std::vector<bool>> visited_points;

He said std::map was not the best solution, since using a vector was faster.

I'm wondering why using a double vector is better in terms of style and performance. Is it because implementing a Compare is hard for a Point? A double vector feels hacky to me, and I also think it looks uglier than using a set or map. Is it really the best way to approach this problem, or is there a better solution I don't know about?

Upvotes: 0

Views: 2821

Answers (1)

rici
rici

Reputation: 241861

If someone asks you, in abstract, "What is the best way of keeping track of objects I've visited?", then you would be forgiven for replying "Use an std::unordered_set<Object>" (usually called a hash table for languages other than C++). That's a nice simple answer and it is often correct if you don't know anything at all about the objects. After all, a hash lookup is (expected) O(1), and in practice is usually quite fast.

There are a few caveats, the biggest one being that you will need to be able to compute a hash for each object. The C++ standard library does not (yet) come with a framework for computing hashes of arbitrary objects, not even PODs, and rendering an object as a string in order to be able to take advantage of std::hash<std::basic_string> is usually way too much work (unless the object is already a string, of course).

If you can't figure out how to write a hash function for you object, you might then think about using an ordered associative container (aka a balanced BST). However, that is not a good idea. Not because it is difficult to write a comparison function. Writing comparison functions is usually trivial, particularly for PODs; you can leverage the fact that std::tuple implements a comparison function for every tuple whose element types are all comparable.

The real problem with ordered associative containers is that they are high overhead. Element access is slow: O(log n), not O(1), and the constant is not small either. And the bookkeeping data required to maintain the balanced tree is much larger than the two-pointer hash-table node (and even that is quite big for small objects). So ordered associative containers really only make sense if you need to be able to traverse them in order. Generally, "visited" maps don't need to be traversed at all -- they are just used for lookup.

Both ordered and unordered containers have another problem: the objects in the container are individual dynamic memory allocations (the API requires that references to the objects in the container must be stable), so over time the individual objects end up getting scattered across dynamic memory, leading to a lot of cache misses.

But, really, even before you start thinking about how easy (or difficult) it will be to hash your objects in order to keep them in a hash-set, you should think about the nature of the objects you are tracking. In particular, can they be easily indexed with a small(-ish) integer? If so, you could just use a vector of bits, one bit per possible object. That's an efficient representation, both for access speed (definitely O(1)) and for space, and it is optimal for memory caching.

If your objects are easily numbered then bit-vectors will be an attractive alternative. One bit per object is (literally) two orders of magnitude less space than a hash-map, so unless you expect your visited map to be extremely sparse (rarely the case in algorithms which need a visited map), it's going to be a big win.

In the case of your problem, which I gather has to do with keeping track of points visited in a rectangular array such as a gameboard or an image, it is clear that the bit vector approach is going to work out well. It's true that you require two levels of indexing (unless you reduce the two indices into a single integer, which is quite easy if you know the dimensions), but that doesn't add much overhead.

Although there are doubts about how good an idea it was, the C++ standard library special cases std::vector<bool> to really be a bit vector. That makes it impossible to create a native pointer to a single element of the vector (which is why many people consider std::vector<bool> to be a hack), and creates some other odd issues when you try to use it as a vector. But if all you want is a bitmask -- as in the case of a visited map -- then it is a pretty good solution.

C++ also offers real bit vectors -- std::bitset -- but unfortunately these need to have their size known at compile time. Boost offers dynamic_bitset, which is a kind of std::vector<bool> written with hindsight, so it's also worth looking at.

Upvotes: 3

Related Questions