Reputation: 6875

Efficient way to store C++ object with multiple fields of different types in std::set/std::map

I have a general question. Let's assume I have a C++ class with multiple fields of different types. I want/need to store the objects of this class in std::set or std::map (in order to access them in O(log(N)).

In order to do it I need to overload operator< BUT what if operator< has no any logical meaning in my case? For example I have class faceDescription which contains fields like eye color, nose type etc.

The most obvious would be to implement operator< just by comparing each field like this:

if (fieldA < other.fieldA)
{
    return true;
}
else if (fieldA == other.fieldA && fieldB < other.fieldB)
...

and so on. But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.

I was thinking about "packing" all the fields into a buffer and then compare it with something like std::memcmp but the point is that some fields may be pointers or different classes/structs.

So my question:

Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some std methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?

EDIT

Just an example which may explain the motivation and should be clear for everyone:

Assume video processing with face recognition so that the program receives face description object and it has to count how many times each face appeared during the given video. There may be thousands/millions of faces. So the efficient way to do it is to maintain a map of face description object as a key and number of appearance as a value.

Thanks in advance!

Upvotes: 3

Answers (3)

dspfnder

Reputation: 1123

Have you considered using tuple?

// Multi-index map
map<tuple<int, char, float>, string> m;
m[make_tuple(31, 'd', 23.5f)] = "Just an idea";

Upvotes: 0

Matteo Italia

Reputation: 126787

Your question is actually more like three questions packed in one:

I need to overload `operator<` BUT what if `operator<` has no any logical meaning in my case?

You don't really need to overload operator<, just provide a custom comparer to std::set or std::map (it's their second template argument); the default is std::less (which uses operator<), but you can provide any binary functor that defines a strict weak ordering relation between your elements.

The most obvious would be to implement `operator<` just by comparing each field [...] But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.

Unfortunately, C++ has no reflection (not even compile-time reflection, which would solve the situation here), so there's no easy way to make the "remember to add all the fields to the comparer whenever I add them to the struct".

However, the lexicografical comparison of a tuple of heterogeneous values is already solved (in C++11) by std::tuple; you can easily implement an operator< (or, FWIW, your custom comparer) by using std::tie and calling < on the returned tuples:

bool myComparer(const MyStruct &a, const MyStruct &b) {
    return std::tie(a.member1, a.member2, a.member3) < std::tie(b.member1, b.member2, b.member3);
}

You can find a similar example at its reference page on cppreference.com.

Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some `std` methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?

Creating a unique identifier to compare/sort objects (i.e. satisfies the constraints of a strict weak ordering) depends on the exact details of your object - but probably, if you say that your objects do not have a meaningful ordering (besides an artificial one that you can impose by lexicographically comparing their components) you don't actually want such a thing; you just want to be able to use associative containers.

Enter std::unordered_map and std::unordered_set (actually hashtables behind the standardese decoy names); what they require is a "somewhat unique" identifier that can quickly discriminate between different keys, AKA a hash function, and they can retrieve your element on average in O(1) time. In C++11, this function is std::hash.

The standard already defines overloads of it for primitive types plus some other random types; you can define your own hash (following the standard signature; see at the bottom for an example specialization) by combining the hashes of the individual components of your struct; the combination can go from plain XOR or sum to something more elaborated like this.

Upvotes: 2

fatihk

Reputation: 7919

You can create your own hash function taking class members as arguments and then, you can store your objects in a std::map or std::unordered_map structure using these hash values as keys. So that you won't bother to compare new objects with the all objects in the map. You can also use std::hash for this particular purpose.

You can specialize std::hash for a user defined class (from the reference):

#include <iostream>
#include <functional>
#include <string>

struct S
{
    std::string first_name;
    std::string last_name;
};

namespace std
{
    template<>
    struct hash<S>
    {
        typedef S argument_type;
        typedef std::size_t result_type;

        result_type operator()(argument_type const& s) const
        {
            result_type const h1 ( std::hash<std::string>()(s.first_name) );
            result_type const h2 ( std::hash<std::string>()(s.last_name) );
            return h1 ^ (h2 << 1);
        }
    };
}

int main()
{
    S s;
    s.first_name = "Bender";
    s.last_name =  "Rodriguez";
    std::hash<S> hash_fn;

    std::cout << "hash(s) = " << hash_fn(s) << "\n";
}

Upvotes: 0

Efficient way to store C++ object with multiple fields of different types in std::set/std::map

Answers (3)

I need to overload operator< BUT what if operator< has no any logical meaning in my case?

The most obvious would be to implement operator< just by comparing each field [...] But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.

Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some std methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?

Related Questions

I need to overload `operator<` BUT what if `operator<` has no any logical meaning in my case?

The most obvious would be to implement `operator<` just by comparing each field [...] But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.

Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some `std` methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?