JBentley
JBentley

Reputation: 6260

Is it bad practice for operator== to mutate its operands?

Scenario

I have a class which I want to be able to compare for equality. The class is large (it contains a bitmap image) and I will be comparing it multiple times, so for efficiency I'm hashing the data and only doing a full equality check if the hashes match. Furthermore, I will only be comparing a small subset of my objects, so I'm only calculating the hash the first time an equality check is done, then using the stored value for subsequent calls.

Example

class Foo
{
public:

   Foo(int data) : fooData(data), notHashed(true) {}

private:

   void calculateHash()
   {
      hash = 0; // Replace with hashing algorithm
      notHashed = false;
   }

   int getHash()
   {
      if (notHashed) calculateHash();
      return hash;
   }

   inline friend bool operator==(Foo& lhs, Foo& rhs)
   {
      if (lhs.getHash() == rhs.getHash())
      {
         return (lhs.fooData == rhs.fooData);
      }
      else return false;
   }

   int fooData;
   int hash;
   bool notHashed;
};

Background

According to the guidance on this answer, the canonical form of the equality operator is:

inline bool operator==(const X& lhs, const X& rhs);

Furthermore, the following general advice for operator overloading is given:

Always stick to the operator’s well-known semantics.

Questions

  1. My function must be able to mutate it's operands in order to perform the hashing, so I have had to make them non-const. Are there any potential negative consequences of this (examples might be standard library functions or STL containers which will expect operator== to have const operands)?

  2. Should a mutating operator== function be considered contrary to its well-known semantics, if the mutation doesn't have any observable effects (because there's no way for the user to see the contents of the hash)?

  3. If the answer to either of the above is "yes", then what would be a more appropriate approach?

Upvotes: 21

Views: 1624

Answers (6)

dchhetri
dchhetri

Reputation: 7136

You can go the mutable route, but I'm not sure if that is needed. You can do a local cache when needed without having to use mutable. For example:

#include <iostream>
#include <functional> //for hash

using namespace std;

template<typename ReturnType>
class HashCompare{
public:
    ReturnType getHash()const{
        static bool isHashed = false;
        static ReturnType cachedHashValue = ReturnType();
        if(!isHashed){
            isHashed = true;
            cachedHashValue = calculate();
        }
        return cachedHashValue;
    }
protected:
    //derived class should implement this but use this.getHash()
    virtual ReturnType calculate()const = 0;
};



class ReadOnlyString: public HashCompare<size_t>{
private:
    const std::string& s;
public:
    ReadOnlyString(const char * s):s(s){};
    ReadOnlyString(const std::string& s): s(s){}

    bool equals(const ReadOnlyString& str)const{
        return getHash() == str.getHash();
    }
protected:
    size_t calculate()const{
        std::cout << "in hash calculate " << endl;
        std::hash<std::string> str_hash;
        return str_hash(this->s);
    }
};

bool operator==(const ReadOnlyString& lhs, const ReadOnlyString& rhs){ return lhs.equals(rhs); }


int main(){
    ReadOnlyString str = "test";
    ReadOnlyString str2 = "TEST";
    cout << (str == str2) << endl;
    cout << (str == str2) << endl;
}

Output:

 in hash calculate 
 1
 1

Can you give me a good reason to keep as to why keeping isHashed as a member variable is necessary instead of making it local to where its needed? Note that we can further get away from 'static' usage if we really want, all we have todo is make a dedicated structure/class

Upvotes: 0

Niels Keurentjes
Niels Keurentjes

Reputation: 41958

Yes, introducing semantically unexpected side-effects is always a bad idea. Apart from the other reasons mentioned: always assume that any code you write will forever only be used by other people who haven't even heard of your name, and then consider your design choices from that angle.

When someone using your code library finds out his application is slow, and tries to optimize it, he will waste ages trying to find the performance leak if it is inside an == overload, since he doesn't expect it, from a semantic point of view, to do more than a simple object comparison. Hiding potentially costly operations within semantically cheap operations is a bad form of code obfuscation.

Upvotes: 1

gsf
gsf

Reputation: 7232

  1. Having side effect in the comparison function or operator is not recommended. It will be better if you can manage to compute the hash as part of the initialization of the class. Another option is to have a manager class that is responsible for that. Note: that even what seems as innocent mutation will require locking in multithreaded application.
  2. Also I will recommend to avoid using the equality operator for classes where the data structure is not absolutely trivial. Very often the progress of the project creates a need for comparison policy (arguments) and the interface of the equality operator becomes insufficient. In this case adding compare method or functor will not need to reflect the standard operator== interface for immutability of the arguments.
  3. If 1. and 2. seem overkill for your case you could use the c++ keyword mutable for the hash value member. This will allow you to modify it even from a const class method or const declared variable

Upvotes: 1

ya23
ya23

Reputation: 14496

You should never modify the object on comparison. However, this function does not logically modify the object. Simple solution: make hash mutable, as computing the hash is a form of cashing. See: Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?

Upvotes: 3

Daniel Frey
Daniel Frey

Reputation: 56863

It seems like a perfectly valid usecase for a mutable member. You can (and should) still make your operator== take the parameters by const reference and give the class a mutable member for the hash value.

Your class would then have a getter for the hash value that is itself marked as a const method and that lazy-evaluates the hash value when called for the first time. It's actually a good example of why mutable was added to the language as it does not change the object from a user's perspective, it's only an implementation detail for caching the value of a costly operation internally.

Upvotes: 37

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145249

Use mutable for the data that you want to cache but which does not affect the public interface.

U now, “mutate” → mutable.

Then think in terms of logical const-ness, what guarantees the object offers to the using code.

Upvotes: 12

Related Questions