Comparing std::vector or std::set in this case for time complexity - More efficent?

Question

I currently have a function that returns strings. I need to keep a track of these returned strings and if an action is not taken on a returned string then I have to take an action on it.

My first thought is using a vector (i.e) std::vector.

Here is what a mechanism utilizing a vector would look like

1-Check if item exists in a vector using std::find

std::find(vector.begin(), vector.end(), item)!=vector.end()

2-If item not present do a push_back (Amortized constant) and perform an action on it else ignore the string

My second thought is using a std::set

1-Check if item exists in set by doing the insert function if not insert it

 if(set.insert(somestring).second)
    {
      //Item inserted in set and it did not exist

    }

The time complexity of insert in set is O(logn). The push_back of vector is Amortized constant and if the vector isn't sorted(in this it isn't) std::find will be O(n). Is my assumption correct that for maximum efficiency i should be using a set here ? Is there anything that I might be missing ?

Richard Hodges · Accepted Answer

I used to work on a foreign exchange pricing system in a bank. Performance was of great interest to us. We used to have long discussions on optimal algorithms... And then one day we measured the performance with a profiling tool.... and we found that the actual algorithms used up 5% of the processing time. The remaining 95% was taken up in converting strings to doubles and doubles to strings when the system received and sent messages to and from the message bus.

Why do I write this? Just to illustrate that in almost all cases, the choice of your container is probably irrelevant. Your program is very unlikely to spend more than a fraction of its time finding items in maps, sets or vectors.

Write the code in the most elegant and maintainable way you can, using easily understood algorithms, and containers that naturally fit the design (sets and maps for things that need to be ordered, vectors for general storage, unordered sets and maps if order isn't important and your data sets are huge). If you need multiple ordered indexes on the same data then probably a vector for storage with sets of iterators/pointers for indexing (like a database).

Then, when it's finished, if your users are screaming at you that it's too slow (they won't - they're more concerned about it working reliably), profile the code and measure for bottlenecks. They will almost always be in the I/O.

If in the incredibly unlikely scenario that your code is spending 90% of its time managing collections of data, then it's time to rethink the algorithm because the design is probably inefficient - or you're writing a protein folding simulator.

If you're sure that the design is optimal, then maybe it's time to reconsider the type of the container.

There are only fundamentally 3 types - you can find the best solution by trial and error in less time than it takes to argue about it.

:-)

Comparing std::vector or std::set in this case for time complexity - More efficent?

Answers (1)

Related Questions