Caching of floating point values in C++

Question

I would like to assign a unique object to a set of floating point values. Doing so, I am exploring two different options:

The first option is to maintain a static hash map (std::unordered_map) in the class and to avoid that duplicates are created in the first place. This means that instead of calling the constructor, I will check if the value is already in the hash and if so, reuse this. I would also need to remove the value from the hash map in the destructor.

The second option is to allow duplicate values during creation, only to try to sort them all at once and detect duplicates after all values have been created. I guess I would need hash maps for that sorting as well. Or would an ordered map ('std::map) work just as well then?

Is there some reason to expect that the first option (which I like better) would be considerably slower in any situation? That is, would finding duplicate entries be much faster if I perform it all entries at once rather than one entry at a time?

I am aware of the pitfalls when cashing floating point numbers and will prevent not-a-numbers and infinities to be added to the map. Some duplicate entries for the same constant is also not a problem, should this occur for a few entries - it will only result in a very small speed penalty.

James Kanze · Accepted Answer

Depending on the source and the possible values of the floating point numbers, a bigger problem might be defining a hash function which respects equality. (0, Inf and NaN are the problem values—most floating point formats have two representations for 0, +0.0 and -0.0, which compare equal; I think the same thing holds for Inf. And two NaN always compare unequal, even when they have exactly the same bit pattern.)

Other than that, in all questions of performance, you have to measure. You don't indicate how big the set is likely to become. Unless it is enormous, if all values are inserted up front, the fastest solution is often to use push_back on an std::vector, then std::sort and, if desired, std::unique after the vector has been filled. In many cases, using an std::vector and keeping it sorted is faster even when insertions and removals are frequent. (When you get a new request, use std::lower_bound to find the entry point; if the value at the location found is not equal, insert a new entry at that point.) The improved locality of std::vector largely offsets any additional costs due to moving the objects during insertion and deletion, and often even the fact that access is O(lg n) rather than O(1). (In one particular case, I found that the break even point between a hash table and as sorted std::vector was around 100,000 entries.)

Caching of floating point values in C++

Answers (2)

Related Questions