Reputation: 379
I have recently started relearning C++ as I have not been coding in C++ for more than a decade. I have rarely used the STL, even when I worked at SGI, and I want to master it. I have ordered a book and I am currently running different online tutorials.
One tutorial introduced std::find(begin(),end(),value)
and I was shocked at how slow it was in the test code I wrote. After doing some trial and error I found that s.find(value)
was clearly what I should be using.
Why is the first find in the code so dramatically slow?
set<int> s;
for (int i = 0; i < 100000; i++)
s.insert(rand());
for (int i = 0; i < 10000; i++) {
int r = rand();
//first find is about 1000x slower than the next one
auto iter1 = std::find(s.begin(), s.end(), r);
auto iter2 = s.find(r);
}
EDIT: added timing experiment results
@juanchopanza asked about timing in the comments so I timed std::find()
on Set, List, Vector and set.find()
(I only measured find - variation between runs was below 10%)
Vector performs much better than List or Set, but the specialized find from set wins with big data sets.
Elements Vector List Set | Set.Find()
10 0.0017 0.0017 0.0020 | 0.0017
100 0.0028 0.0051 0.0120 | 0.0019
1000 0.0105 0.0808 0.1495 | 0.0035
10000 0.0767 0.7486 2.7009 | 0.0068
100000 0.2572 2.4700 6.9636 | 0.0080
1000000 0.2674 2.5922 7.0149 | 0.0082
10000000 0.2728 2.6485 7.0833 | 0.0082
Upvotes: 3
Views: 3134
Reputation: 275310
The first reason is that std::find
is specified in terms of linear search. Meanwhile, std::set.find
is specified in terms of logarithmic time search.
But if you replaced std::find
with std::equal_range
, which will do a binary search, you'll find it is as slow as std::find
.
So I'll answer a better question than you asked:
std::equal_range
ridiculously slow on set iterators?Well, there really isn't a great reason.
std::set
iterators are bidirectional iterators. This means that they permit going forward one step, or backwards one step.
std::equal_range
on bidirectional iterators is extremely slow, because it has to walk step by step through the range.
The std::set.find
method, on the other hand, uses the tree structure of std::set
to find the element really fast. It can, basically, get midpoints of a range really fast.
C++ does not expose this tree structure when you access std::set
via its iterators. If it had, there might have been an operation like std::somewhere_between( start, finish )
which would in O(1) time get an iterator between start
and finish
, returning finish
if no such iterator exists.
Such an operation is actually really cheap on the tree structure implementation of std::set
.
However this operation doesn't exist. So std::equal_range( begin(set), end(set) )
is ridiculously slow.
Possibly not exposing an operation like std::somewhere_between
for sorted associative containers makes some set/map implementations more efficient; many used to use special nodes to replace some leaf cases. And maybe you'd need access to that special node to efficiently binary search the tree.
But I seriously doubt it is worth the lack of that operation. With that operation, you can work on a std::set
or std::map
subsection efficiently; without it, you got nothing.
Upvotes: 2
Reputation: 19113
To expand on my comment.
Because set::find
has more information about elements in the search range. It knows its (probably) implemented as a sorted binary tree and can search it in logarithmic time.
std::find
on the other hand only gets two bidirectional iterators, so the best it can do is basically just a for loop.
Had the set returned random-access iterator,
EDIT: Corrected my wrong claims.std::find
would have also been logarithmic.
Upvotes: 4
Reputation: 9837
std::find
is a generic algorithm that given a pair of iterators can find a value. And if all it has been given is a pair of iterators the best way to find a value is just to linearly search for it which is O(n).
set::find
is a member function of std::set
and so it knows the data structure its searching over and so can optimise the search. And sorted, balanced trees have excellent searching behavoir of O(log(n))
Upvotes: 8