chlorine
chlorine

Reputation: 33

C++ set: storing duplicates: confused about < operator

I'm quite new to C++ (but know my way around C) so I'm probably missing something obvious.

TLDR: I use a std::set which stores elements twice, which is definitely not what I want.

Long story: I've defined a class Clique and I need to store elements of this class in a set, so I've defined the < operator for Clique:

class Clique{
public :
  int b;
  int e;
  int l;
  std::set<int> X;

  bool operator <( const Clique &rhs ) const
  {
    if( b < rhs.b)
      return true;
    if( e < rhs.e)
      return true;
    if( X.size() < rhs.X.size() )
      return true;
    std::set<int>::iterator itX = X.begin();
    std::set<int>::iterator itrhs = rhs.X.begin();
    // both sets have same size, need only to check end for one of them                                                                                                                                            
    while( (*itX == *itrhs) && ( itX != X.end() ) ){
      ++itX;
      ++itrhs;
    }
    if( itX == X.end() ){
      //both sets are equal                                                                                                                                                                                        
      return false;
    }
    else
      return ( *itX < *itrhs );
  }

  void print_clique(FILE *F) const ;
};

(I wasn't sure how set comparison is done, so I wrote a routine for comparing them first by size, then element by element).

Now I want to store Clique elements in a set and this is where the problem appears. My std::set (1) does not appear to store Clique elements in the order I've defined; (2) stores several copies of the same Clique

I've written a function to print a set of Clique:

void print_cliqueset(std::set<Clique> mySet){
  int setsize = 0;

  std::set<Clique>::iterator it = mySet.begin();
  Clique cur_c = *it;
  Clique prev_c = *it;
  while( it != mySet.end() ){
  //  for( std::set<Clique>::iterator it = mySet.begin(); it != mySet.end(); ++it ){                                                                                                                               
    it->print_clique(stdout);
    setsize ++;
    ++it;
    if( it != mySet.end() ){
      cur_c = *it;
      assert ( prev_c < cur_c);
      gassert( prev_c.b <= cur_c.b );
    prev_c = *it;
    }
  }

  assert( setsize == mySet.size() );
}

My function is more complicated than needed but I wanted to make sure I understood what was going on.

Here is a typical output of printing such a set: There's a line for each Clique, in which I print first b, then e, then the elements in the set X.

6829 9716 1 2 3 5 8 9 10 
6792 9687 1 2 3 7 8 9 10 
606 6531 1 2 3 5 6 7 8 9 
6829 9687 1 2 3 5 7 8 9 10 
410 9951 2 6 
484 9805 1 2 4 6 
494 9805 2 4 6 10 
506 9805 1 2 5 6 
484 9821 1 2 4 
484 9871 2 3 4 6 
506 9821 1 2 5 
484 9802 1 2 3 4 6 
486 9805 1 2 4 6 9 
486 9802 1 2 3 4 6 9 
507 9802 1 2 3 4 6 9 10 
502 9802 1 2 3 4 6 10 
506 9802 1 2 3 5 6 
507 9806 1 2 4 9 10 
507 9805 1 2 5 6 9 
527 9806 1 2 5 9 10 

As we can see, the cliques are not at all sorted on the order I defined (or wanted to define). They should be sorted first by member b (which is the first of each line), and this is not the case at all.

Then I have some duplicate lines in the output (not appearing in the example above but present in the full output). I guess the fact that I have duplicates is not surprising given that it seems confused about the order...

I guess the answer is something fairly obvious but I fail to see it. Any help would be appreciated!

Upvotes: 3

Views: 129

Answers (4)

Richard Hodges
Richard Hodges

Reputation: 69882

Operator < must provide strict weak ordering. I.e. if x < y then !(y < x) and !(y == x).

In the case of Clique, the requirements seem to be that the elements b, e, and X are compared lexographically.

The idiomatic way to represent this is to do all comparisons in terms of operator<:

#include <set>

class Clique{
public :
    int b;
    int e;
    int l;
    std::set<int> X;

    bool operator <( const Clique &r ) const
    {
        auto const& l = *this;

        if (l.b < r.b) return true;
        if (r.b < l.b) return false;

        if (l.e < r.e) return true;
        if (r.e < l.e) return false;

        if (l.X < r.X) return true;
        if (r.X < l.X) return false;

        return false;
    }

    void print_clique(FILE *F) const ;
};

And yes, std::set really does provide operator< when the key type provides it.

Another way to write this, as Jarod was alluding to is this:

#include <set>
#include <tuple>

class Clique{
public :
    int b;
    int e;
    int l;
    std::set<int> X;

    bool operator <( const Clique &r ) const
    {
        auto const& l = *this;
        return std::tie(l.b, l.e, l.X) < std::tie(r.b, r.e, r.X);
    }

    void print_clique(FILE *F) const ;
};

Which I think you'll agree is concise, expressive, correct and idiomatic.

Upvotes: 1

Ethouris
Ethouris

Reputation: 1901

The definition of operator< should be such that for each pair of elements 'b' and 'e' the relationship b < e should be used to determine any kind of relationship. The following equivalences are in force here:

a > b <==> b < a

a == b <==> !(a < b) && !(b < a)

a >= b <==> `!(a < b)

And so on. If you use multiple fields to be checked for every relationship check, then you have a kind-of multidimensional ranges. Making a flat range out of that can be only done this way:

  • More significant field is checked first; if in this field values aren't equal, you return the result immediately
  • Otherwise - if they are equal - you check the next field in the significance order and so on.

The requirement of using this complicated relationship definition in the set makes things actually harder for you because all you should do is to state whether one element is less than the other. So in your case you'll have to check for equality inside by yourself. Your procedure checks the fields "next in significance chain" also if lhs.b > rhs.b.

Upvotes: 1

alexeykuzmin0
alexeykuzmin0

Reputation: 6440

Your operator< is broken. Consider two Cliques:

c1 is {b = 0, e = 1, ...}
c2 is {b = 1, e = 0, ...}

Your code will return true for both c1 < c2 and c2 < c1.

Obviously, in such situation std::set shows strange behavior.

I would fix your operator< in the following way:

bool operator <( const Clique &rhs ) const
{
    if( b != rhs.b)
        return b < rhs.b;
    if( e != rhs.e)
        return e < rhs.e;
    if( X.size() != rhs.X.size() )
        return X.size() < rhs.X.size();
    std::set<int>::iterator itX = X.begin();
    std::set<int>::iterator itrhs = rhs.X.begin();
    // both sets have same size, need only to check end for one of them
    while((itX != X.end()) && (itX == *itrhs)){
        ++itX;
        ++itrhs;
    }
    if( itX == X.end() ){
    //both sets are equal
        return false;
    }
    else
        return ( *itX < *itrhs );
}

Upvotes: 4

Jarod42
Jarod42

Reputation: 217283

Your bool operator <( const Clique &rhs ) const is wrong as it doesn't respect strict ordering.

It may simply be:

bool operator <(const Clique& rhs) const
{
    return std::tie(b, e, X) < std::tie(rhs.b, rhs.e, rhs.X);
}

Upvotes: 4

Related Questions