Reputation: 1315
I would like to assign a unique object to a set of floating point values. Doing so, I am exploring two different options:
The first option is to maintain a static hash map (std::unordered_map<double,Foo*>
) in the class and to avoid that duplicates are created in the first place. This means that instead of calling the constructor, I will check if the value is already in the hash and if so, reuse this. I would also need to remove the value from the hash map in the destructor.
The second option is to allow duplicate values during creation, only to try to sort them all at once and detect duplicates after all values have been created. I guess I would need hash maps for that sorting as well. Or would an ordered map ('std::map) work just as well then?
Is there some reason to expect that the first option (which I like better) would be considerably slower in any situation? That is, would finding duplicate entries be much faster if I perform it all entries at once rather than one entry at a time?
I am aware of the pitfalls when cashing floating point numbers and will prevent not-a-numbers and infinities to be added to the map. Some duplicate entries for the same constant is also not a problem, should this occur for a few entries - it will only result in a very small speed penalty.
Upvotes: 0
Views: 606
Reputation: 153889
Depending on the source and the possible values of the floating point
numbers, a bigger problem might be defining a hash function which
respects equality. (0, Inf and NaN are the problem values—most
floating point formats have two representations for 0, +0.0
and
-0.0
, which compare equal; I think the same thing holds for Inf. And
two NaN always compare unequal, even when they have exactly the same bit
pattern.)
Other than that, in all questions of performance, you have to measure.
You don't indicate how big the set is likely to become. Unless it is
enormous, if all values are inserted up front, the fastest solution is
often to use push_back
on an std::vector
, then std::sort
and, if
desired, std::unique
after the vector has been filled. In many
cases, using an std::vector
and keeping it sorted is faster even when
insertions and removals are frequent. (When you get a new request, use
std::lower_bound
to find the entry point; if the value at the location
found is not equal, insert a new entry at that point.) The improved
locality of std::vector
largely offsets any additional costs due to
moving the objects during insertion and deletion, and often even the
fact that access is O(lg n) rather than O(1). (In one particular case,
I found that the break even point between a hash table and as sorted
std::vector
was around 100,000 entries.)
Upvotes: 2
Reputation: 247899
Have you considered actually measuring it?
None of us can tell you how the code you're considering will actually perform. Write the code, compile it, run it and measure how fast it runs.
Spending time trying to predict which solution will be faster is (1) a waste of your time, and (2) likely to yield incorrect results.
But if you want an abstract answer, it is that it depends on your use case.
If you can collect all the values, and sort them once, that can be done in O(n lg n)
time.
If you insert the elements one at a time into a data structure with the performance characteristics of std::map
, then each insertion will take O(lg n)
time, and so, performing n
insertions will also take O(n lg n)
time.
Inserting into a hash map (std::unordered_map
) takes constant time, and so n
insertions can be done in O(n)
. So in theory, for sufficiently large values of n
, a hash map will be faster.
In practice, in your case, no one knows. Which is why you should measure it if you're actually concerned about performance.
Upvotes: 0