Dinushan
Dinushan

Reputation: 2077

map<key,bool> vs set<key> to keep track uniqueness of a key collection

I write a text file from a const list of strings and I need to avoid duplicates (the list contains duplicates) .Which of these data structure is better(in terms of performance) to use to track already written strings,

map<string,bool>
set<string>

Now how I'm going to do this is,

foreach(string in list)
    if(not found in map/set)
       write to file
       insert to map/set
    endif
end

Or else is there alternative way of doing this?

Upvotes: 2

Views: 1641

Answers (4)

lip
lip

Reputation: 688

You do not really need another container, use algorithms:

std::vector<std::string> list = ...
std::sort(list.begin(), list.end());
std::unique(list.begin(), list.end());

// alternatively, copy to your file without changing source vector
std::unique_copy(list.begin(), list.end(), std::ostream_iterator(out_stream));

Whatever you do, you get an n.log complexity for operations (inserting in the map/set * n items). A map/set solution gets you 2.n.log operations for 2.n memory ; using algorithms get the job done with n+n.log operations and 1.n memory.

Upvotes: 1

juanchopanza
juanchopanza

Reputation: 227438

A map contains no entries with duplicate keys, so there is no point in using map<string,bool>. This is irrespective of performance. std::set<std::string> or std::unordered_set<std::string> would do the job. Here's an example:

std::vector<std::string> word_list = ....;
std::set<std::string> word_set;

for (const auto& s : work_list) // loop over words in word_list
{
  if(word_set.insert(s).second) // attempt to insert: fails if s already in set
  {
    // insertion succeeded: write to file
  }
}

Upvotes: 3

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70939

In case you have the option to use c++11 I advise you to use unordered_set as it should perform asymptotically better than set. If this is not an option, then use set. There is no reason to use a map<string, bool> for this task.

Upvotes: 1

Jan Herrmann
Jan Herrmann

Reputation: 2767

You might gain a performance improvement with set<string> because map<string,bool> needs to store an additional bool value, which has at least size 1. Depending how the allocator and std::string are implemented this might result in larger memory consumtion (think of allignment) and cache misses. Look here for finding and inserting.

Upvotes: 1

Related Questions