Reputation: 2077
I write a text file from a const list of strings and I need to avoid duplicates (the list contains duplicates) .Which of these data structure is better(in terms of performance) to use to track already written strings,
map<string,bool>
set<string>
Now how I'm going to do this is,
foreach(string in list)
if(not found in map/set)
write to file
insert to map/set
endif
end
Or else is there alternative way of doing this?
Upvotes: 2
Views: 1641
Reputation: 688
You do not really need another container, use algorithms:
std::vector<std::string> list = ...
std::sort(list.begin(), list.end());
std::unique(list.begin(), list.end());
// alternatively, copy to your file without changing source vector
std::unique_copy(list.begin(), list.end(), std::ostream_iterator(out_stream));
Whatever you do, you get an n.log complexity for operations (inserting in the map/set * n items). A map/set solution gets you 2.n.log operations for 2.n memory ; using algorithms get the job done with n+n.log operations and 1.n memory.
Upvotes: 1
Reputation: 227438
A map contains no entries with duplicate keys, so there is no point in using map<string,bool>
. This is irrespective of performance. std::set<std::string>
or std::unordered_set<std::string>
would do the job. Here's an example:
std::vector<std::string> word_list = ....;
std::set<std::string> word_set;
for (const auto& s : work_list) // loop over words in word_list
{
if(word_set.insert(s).second) // attempt to insert: fails if s already in set
{
// insertion succeeded: write to file
}
}
Upvotes: 3
Reputation: 70939
In case you have the option to use c++11 I advise you to use unordered_set
as it should perform asymptotically better than set
. If this is not an option, then use set
. There is no reason to use a map<string, bool>
for this task.
Upvotes: 1
Reputation: 2767
You might gain a performance improvement with set<string>
because map<string,bool>
needs to store an additional bool value, which has at least size 1. Depending how the allocator and std::string are implemented this might result in larger memory consumtion (think of allignment) and cache misses. Look here for finding and inserting.
Upvotes: 1