Ves
Ves

Reputation: 379

Why is std::find( s.begin(), s.end(), val ) 1000x slower than s.find(val) for a set<int> s?

I have recently started relearning C++ as I have not been coding in C++ for more than a decade. I have rarely used the STL, even when I worked at SGI, and I want to master it. I have ordered a book and I am currently running different online tutorials.

One tutorial introduced std::find(begin(),end(),value) and I was shocked at how slow it was in the test code I wrote. After doing some trial and error I found that s.find(value) was clearly what I should be using.

Why is the first find in the code so dramatically slow?

set<int> s;

for (int i = 0; i < 100000; i++)
    s.insert(rand());

for (int i = 0; i < 10000; i++) {
    int r = rand();

    //first find is about 1000x slower than the next one
    auto iter1 = std::find(s.begin(), s.end(), r);
    auto iter2 = s.find(r);
}

EDIT: added timing experiment results

@juanchopanza asked about timing in the comments so I timed std::find() on Set, List, Vector and set.find() (I only measured find - variation between runs was below 10%)

Vector performs much better than List or Set, but the specialized find from set wins with big data sets.

 Elements  Vector     List      Set    | Set.Find()
      10   0.0017    0.0017    0.0020  |  0.0017
     100   0.0028    0.0051    0.0120  |  0.0019
    1000   0.0105    0.0808    0.1495  |  0.0035
   10000   0.0767    0.7486    2.7009  |  0.0068
  100000   0.2572    2.4700    6.9636  |  0.0080
 1000000   0.2674    2.5922    7.0149  |  0.0082
10000000   0.2728    2.6485    7.0833  |  0.0082

Upvotes: 3

Views: 3134

Answers (3)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275310

The first reason is that std::find is specified in terms of linear search. Meanwhile, std::set.find is specified in terms of logarithmic time search.

But if you replaced std::find with std::equal_range, which will do a binary search, you'll find it is as slow as std::find.

So I'll answer a better question than you asked:

Why is std::equal_range ridiculously slow on set iterators?

Well, there really isn't a great reason.

std::set iterators are bidirectional iterators. This means that they permit going forward one step, or backwards one step.

std::equal_range on bidirectional iterators is extremely slow, because it has to walk step by step through the range.

The std::set.find method, on the other hand, uses the tree structure of std::set to find the element really fast. It can, basically, get midpoints of a range really fast.

C++ does not expose this tree structure when you access std::set via its iterators. If it had, there might have been an operation like std::somewhere_between( start, finish ) which would in O(1) time get an iterator between start and finish, returning finish if no such iterator exists.

Such an operation is actually really cheap on the tree structure implementation of std::set.

However this operation doesn't exist. So std::equal_range( begin(set), end(set) ) is ridiculously slow.

Possibly not exposing an operation like std::somewhere_between for sorted associative containers makes some set/map implementations more efficient; many used to use special nodes to replace some leaf cases. And maybe you'd need access to that special node to efficiently binary search the tree.

But I seriously doubt it is worth the lack of that operation. With that operation, you can work on a std::set or std::map subsection efficiently; without it, you got nothing.

Upvotes: 2

Quimby
Quimby

Reputation: 19113

To expand on my comment.

Because set::find has more information about elements in the search range. It knows its (probably) implemented as a sorted binary tree and can search it in logarithmic time.

std::find on the other hand only gets two bidirectional iterators, so the best it can do is basically just a for loop. Had the set returned random-access iterator, std::find would have also been logarithmic. EDIT: Corrected my wrong claims.

Upvotes: 4

Mike Vine
Mike Vine

Reputation: 9837

std::find is a generic algorithm that given a pair of iterators can find a value. And if all it has been given is a pair of iterators the best way to find a value is just to linearly search for it which is O(n).

set::find is a member function of std::set and so it knows the data structure its searching over and so can optimise the search. And sorted, balanced trees have excellent searching behavoir of O(log(n))

Upvotes: 8

Related Questions