dmitry
dmitry

Reputation: 324

using a custom comparator with std::set

I'm trying to create a list of words read from a file arranged by their length. For that, I'm trying to use std::set with a custom comparator.

class Longer {
 public:
  bool operator() (const string& a, const string& b)
    { return a.size() > b.size();}
};

set<string, Longer> make_dictionary (const string& ifile){
  // produces a map of words in 'ifile' sorted by their length

  ifstream ifs {ifile};
  if (!ifs) throw runtime_error ("couldn't open file for reading");

  string word;
  set<string, Longer> words;

   while (ifs >> word){
     strip(word);
     tolower(word);
     words.insert(word);
 }

 remove_plurals(words);

 if (ifs.eof()){       
   return words;
  }
  else
    throw runtime_error ("input failed");
}

From this, I expect a list of all words in a file arranged by their length. Instead, I get a very short list, with exactly one word for each length occurring in the input:

polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a

Any idea of what's going on here?

Upvotes: 2

Views: 104

Answers (2)

leemes
leemes

Reputation: 45745

Your comparator only compares by length, that means that equally-sized but different strings are treated as being equivalent by std::set. (std::set treats them equally if neither a < b nor b < a are true, with < being your custom comparator function.)

That means your comparator should also consider the string contents to avoid this situation. The keyword here is lexicographic comparison, meaning you take multiple comparison criteria in account. The first criterion would be your string length, and the second would be the string itself. An easy way to write lexicographic comparison is to make use of std::tuple which provides a comparison operator performing lexicographic comparison on the components by overloading the operator<.

To make your "reverse" ordering of length, which you wrote with operator>, compatible with the usually used operator<, simply take the negative size of the strings, i.e. first rewrite a.size() > b.size() as -a.size() < -b.size(), and then compose it with the string itself into tuples, finally compare the tuples with <:

class Longer {
public:
    bool operator() (const string& a, const string& b)
    {
        return std::make_tuple(-a.size(),  a )
             < std::make_tuple(-b.size(),  b );
        //                     ^^^^^^^^^  ^^^
        //                       first    second
        //                   criterion    criterion
    }
};

Upvotes: 0

Oliver Charlesworth
Oliver Charlesworth

Reputation: 272812

With your comparator, equal-length words are equivalent, and you can't have duplicate equivalent entries in a set.

To maintain multiple words, you should modify your comparator so that it also performs, say, a lexicographic comparison if the lengths are the same.

Upvotes: 3

Related Questions