Steve W
Steve W

Reputation: 1128

Check if unordered_set contains all elements in other unordered_set - C++

I'm brand new to C++ and have been asked to convert a Java program to C++. I'm trying to write a method to check that all elements in an unordered_set exist in another unordered_set. I found the example below using hash_set but hash_set is deprecated and it is recommended to use unordered_set now.

// returns true if one contains all elements in two
bool SpecSet::containsAll(hash_set<Species*> one, hash_set<Species*> two) {
   sort(one.begin(), one.end());
   sort(two.begin(), two.end());
   return includes(one.begin(), one.end(), two.begin(), two.end());
}

So I need a way to do this using unordered_set. Sort does not work on unordered sets and lookup speed is important, so I don't want to use an ordered set.

bool SpecSet::containsAll(unordered_set<Species*> one, unordered_set<Species*> two) {

   return ?;
}

I'd really appreciate some help with an approach to doing this efficiently.

EDIT: I guess this will work. It seems there is no more efficient way but to loop over all in two.

bool SpecSet::containsAll(unordered_set<Species*> one, unordered_set<Species*> two) {
   if(two.size() > one.size())
   {
      return false;
   }

   for(Species *species : two)
   {
      if(one.find(species) == one.end())
      {
         return false;
      }
   }
   return true;
}

Upvotes: 6

Views: 4201

Answers (2)

AMA
AMA

Reputation: 4214

Disclaimer: This is not the most efficient approach. It is an attempt at solution that would be as generic and flexible as std::includes while supporting unordered iterator ranges. It is not limited to std::unordered_set and should work for any other container, e.g., std::vector or std::list.


As it was pointed out std::includes requires the input ranges to be sorted. At this moment unordered ranges are not supported in the standard library.

Looking at possible implementations of std::includes a version for unordered ranges can be implemented. For example like so:

template<class InputIt1, class InputIt2>
bool includes_unordered(
    InputIt1 first1, InputIt1 last1,
    InputIt2 first2, InputIt2 last2)
{
    for (; first2 != last2; ++first2)
    {
        InputIt1 it1;
        for (it1 = first1; it1 != last1; ++it1)
        {
            if(*first2 == *it1)
                break;
        }
        if (it1 == last1)
            return false;
    }
    return true;
}

Note: containers' size-comparison optimization is not performed to support containers of non-unique objects. But if needed it can be done using std::distance.

And here's a version taking an equivalence operator:

template<class InputIt1, class InputIt2, class Equivalence>
bool includes_unordered(
    InputIt1 first1, InputIt1 last1,
    InputIt2 first2, InputIt2 last2,
    Equivalence equiv)
{
    for (; first2 != last2; ++first2)
    {
        InputIt1 it1;
        for (it1 = first1; it1 != last1; ++it1)
        {
            if(equiv(*first2, *it1))
                break;
        }
        if (it1 == last1)
            return false;
    }
    return true;
}

Small live-example

Then includes_unordered can be used in the same way as std::includes would.

Upvotes: 1

Toby Speight
Toby Speight

Reputation: 30831

With unsorted collections, there's no faster algorithm than to iterate over the smaller collection while testing that each element is a member of the larger. This will naturally scale as O(n), where n is the size of the putative subset, since we perform an O(1) find operation n times.


Here's some demonstration code, with tests:

#include <unordered_set>

template <typename T>
bool is_subset_of(const std::unordered_set<T>& a, const std::unordered_set<T>& b)
{
    // return true if all members of a are also in b
    if (a.size() > b.size())
        return false;

    auto const not_found = b.end();
    for (auto const& element: a)
        if (b.find(element) == not_found)
            return false;

    return true;
}
int main()
{
    const std::unordered_set<int> empty{ };
    const std::unordered_set<int> small{ 1, 2, 3 };
    const std::unordered_set<int> large{ 0, 1, 2, 3, 4 };
    const std::unordered_set<int> other{ 0, 1, 2, 3, 9 };

    return 0
        +  is_subset_of(small, empty) // small ⊄ ∅
        + !is_subset_of(empty, small) // ∅ ⊂ small
        +  is_subset_of(large, small) // large ⊄ small
        + !is_subset_of(small, large) // small ⊂ large
        +  is_subset_of(large, other) // large ⊄ other
        +  is_subset_of(other, large) // other ⊄ large
        + !is_subset_of(empty, empty) // ∅ ⊂ ∅
        + !is_subset_of(large, large) // x ⊂ x, ∀x
        ;
}

An equivalent, using standard algorithm instead of writing an explicit loop:

#include <algorithm>
#include <unordered_set>

template <typename T>
bool is_subset_of(const std::unordered_set<T>& a, const std::unordered_set<T>& b)
{
    // return true if all members of a are also in b
    auto const is_in_b = [&b](auto const& x){ return b.find(x) != b.end(); };

    return a.size() <= b.size() && std::all_of(a.begin(), a.end(), is_in_b);
}

(obviously using the same main() for tests)


Note that we pass the sets by reference, not by value, as you've indicated that the sets are too large for you to copy and sort them.

Upvotes: 1

Related Questions