user3139868
user3139868

Reputation: 403

C++ set with arbitrary comparator

I have the following C++ code

#include <set>
#include <string>
#include <iostream>
using namespace std;

class Pair {
  public:
    string lhs;
    string rhs;
    Pair();
    Pair( string l, string r ) {
      lhs=l;
      rhs=r;
    };
};

struct compare {
  bool operator()(const Pair& a, const Pair& b) const{
    if ( ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs ) ) {
      cout << "MATCH" << endl;
    }
    return ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs );
  }
};

int main () {
  set<Pair, compare > s;
  Pair p( string("Hello"), string("World") );
  s.insert(p);
  cout << s.size() << "\n";
  Pair q( string("World"), string("Hello") );
  s.insert(q);
  cout << s.size() << "\n";
  compare cmp;
  cout << cmp( p, q );

  return 0;
}

Invoking the compiled code gives:

1
MATCH
MATCH
2
MATCH

Somehow the set s ends up with both Pairs p, and q in spite of the fact that the comparator identifies them as identical. Why?

Any help will be much appreciated!

UPDATE:

Many thanks for the great answers and your kind and professional help. As you might have guessed already, I am quite a newby to C++.

Anyway, I was wondering, if Antoine's answer could be done with a lambda expression?

Something like:

std::set< …, [](){ my_comparator_code_here } > s;

????

Upvotes: 4

Views: 587

Answers (4)

Matthieu M.
Matthieu M.

Reputation: 300389

A comparator for a set, map or any algorithm such as lower_bound or sort which require an order need to implement a strict weak ordering (basically, behave like a <).

Such an ordering is required to have 3 properties:

  • irreflexive: not (a < a) is always true
  • asymmetric: a < b implies not (b < a)
  • transitive: a < b and b < c imply a < c

Which you will not < has.

Such an ordering defines equivalence classes, which are groups of elements that compare equal according to the ordering (that is not (a < b) and not (b < a) is verified). In a set or map, only a single element per equivalence class can be inserted whereas a multiset or multimap may hold multiple elements per equivalence class.

Now, if you look at your comparator, you will realize that you have implemented == which does not define any order at all. You need to implement something akin to < instead.

A simple, but extremely efficient trick, is to use tuples which have < (and == and any other comparison operator) already implemented in a lexicographical order. Thus, std::tuple<std::string, std::string> has exactly the order you which; and even better, std::tuple<std::string const&, std::string const&> also has it, and can be constructed very easily using std::tie.

Therefore, the implementation of a straightforward comparator is as simple as:

struct comparator {
    bool operator()(Pair const& left, Pair const& right) const {
        return std::tie( left.a,  left.b)
             < std::tie(right.a, right.b);
    }
};

Note: although not discussed much, it is absolutely essential that the ordering of the comparator be stable across calls. As such, it should generally only depend on the values of the elements, and nothing external or runtime-related (such as their addresses in memory)


EDIT: as noted, your comparator is slightly more complicated.

In your case, though, you also need to take into account that a and b have a symmetric role. In general, I would suggest uniquifying the representation in the constructor of the object; if not possible, you can uniquify first and compare second:

struct comparator {
    bool operator()(Pair const& left, Pair const& right) const {
        auto uleft = left.a < left.b ? std::tie(left.a, left.b)
                                     : std::tie(left.b, left.a);
        auto uright = right.a < right.b ? std::tie(right.a, right.b)
                                        : std::tie(right.b, right.a);

        assert(get<0>(uleft) <= get<1>(uleft) and "Incorrect uleft");
        assert(get<0>(uright) <= get<1>(uright) and "Incorrect uright");

        return uleft < uright;
    }
}; // struct comparator

Upvotes: 3

user3125280
user3125280

Reputation: 2829

from here

The set object uses this expression to determine both the order the elements follow in the container and whether two element keys are equivalent (by comparing them reflexively: they are equivalent if !comp(a,b) && !comp(b,a)). No two elements in a set container can be equivalent.

Sorry all jumped the gun becuase I disliked another answer. I will exapand and correct momentarily. AS pointed out, an order needs to be implemented. typcially this would be a lexicographical order. Importantly however you still need to make sure that the case for which you consider two pairs to be equal returns false for both cases.

if (( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs )) return false;
//ordinary lexicographical compare
if( a.lhs < b.lhs) return true;
else if( a.lhs == b.lhs && a.rhs < b.rhs) return true;
else return false;

Notic the "!", simple. Your code is saying pair one is less than pair two which is less than pair one. You want it to say that neither is less than the other.

DISCLAIMER STILL WRONG ON A TECHNICALITY, ANTOINE'S IS THE CORRECT ONE

Upvotes: 1

Antoine
Antoine

Reputation: 14104

As Mark B said compare represents an ordering and not an equality, by default it is std::less. In your case, you don't want the comparison to depend on the order in your pair, but at the same time, your operator< must be satisfy a number of conditions.

All the answers here propose to change your specification and make the comparison order-dependant. But if you don't want that, here is the solution:

bool operator()(const Pair & a, const Pair & b) {
  const bool swapA = a.lhs < a.rhs;
  const std::string & al = swapA ? a.lhs : a.rhs;
  const std::string & ar = swapA ? a.rhs : a.lhs;
  const bool swapB = b.lhs < b.rhs;
  const std::string & bl = swapB ? b.lhs : b.rhs;
  const std::string & br = swapB ? b.rhs : b.lhs;
  return al < bl || (al == bl && ar < br);
}

At least, it works on your example, and the relation is reflexive and transitive.

Here is how it works: it is the lexicographic order for pairs: al < bl || (al == bl && ar < br), applied to sorted pairs.

In fact your data structure is a (set of size N) of (set of size 2). Internally, std::set sorts its elements using your comparison operators. For your "set of size 2" Pair you also need to consider them as internally sorted.

If the comparison code looks too heavy, you could move the pair sorting into the Pair class, like implement two methods min() and max(). Also, you implement operator< and then don't need a compare class:

struct Pair {
  string lhs, rhs;
  Pair();
  Pair( string l, string r ) : lhs(l), rhs(r) {}
  const std::string & min() const { return lhs < rhs ? lhs : rhs; }
  const std::string & max() const { return lhs < rhs ? rhs : lhs; }
  bool operator<(const Pair& b) const {
    return min() < b.min() || (min() == b.min() && max() < b.max());
  }
};

Upvotes: 2

Mark B
Mark B

Reputation: 96311

The comparison operator for a std::set (which is an ordered container) needs to identify a strict weak ordering not any arbitrary test you wish. Normally a properly implemented operator< does the job.

If your comparison operator does not provide a strict weak ordered (as yours does not) the behavior will be undefined. There is no way to work around this requirement of the C++ standard.

Note that in certain cases where an equality comparison is needed it will have to use the operator< twice to make the comparison.

Also have you considered using std::pair<std::string, std::string> instead of rolling your own?

I've reread your question about five times now and I'm starting to wonder if what you want is a set of pairs where which string is in first and second doesn't matter as far as the comparison goes. In that case @Antoine has what appears to be the correct solution for you.

Upvotes: 6

Related Questions