Rusan Kax
Rusan Kax

Reputation: 1894

Efficient, or fast, size of the set intersection of two vectors

I find myself needing to return the size of the intersection of two vectors:

std::vector<int> A_, B_

I do not require the intersected values, just the size of the set. This function needs to be called a very large number of times. This is part of a much bigger simulation over a (mathematical) graph/network.

My working conditions are:

My first attempt, using a naive loop, is below. But I think this may not be enough. I've assumed...that std::set_intersection will be too onerous due to repeated sorts and allocations.

   int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_) {

      int c_count=0;

  for(std::vector<int>::const_iterator it = A_.begin(); it != A_.end(); ++it){
     for(std::vector<int>::const_iterator itb = B_.begin(); itb != B_.end(); ++itb){

      if(*it==*itb) ++c_count;
     }
  }

  return c_count;
}

Given my conditions above, how else can I implement this to gain speed, relatively easily? Should I be thinking about hash tables or going with sorts and STL, or different containers?

Upvotes: 13

Views: 6409

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726589

Your algorithm is O(n2) in the number of elements (assuming that the size of both vectors is approximately equal to n). Here is an O(n) algorithm:

  • Create an std::unordered_set<int>
  • Put all items of vector A into the set
  • Go through all items of vector B, checking that they are present in the unordered_set, and incrementing the count for each item that is present.
  • Return the final count.

Here is an implementation in C++11, using a lambda for brevity:

vector<int> a {2, 3, 5, 7, 11, 13};
vector<int> b {1, 3, 5, 7, 9, 11};
unordered_set<int> s(a.begin(), a.end());
int res = count_if(b.begin(), b.end(), [&](int k) {return s.find(k) != s.end();});
// Lambda above captures the set by reference. count_if passes each element of b
// to the lambda. The lambda returns true if there is a match, and false otherwise.

(this prints 4; demo)

Upvotes: 17

Billy ONeal
Billy ONeal

Reputation: 106549

Your algorithm is O(n*m), where n and m are the number of elements in the vectors.

If you don't have issues where the input data is untrusted, you'll probably have the best results with:

  • Place all the elements of A into an unordered_set
  • For each element in B, if it is in the set, increment your counter.

For example:

int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_)
{
    std::unordered_set<int> aSet(A_.cbegin(), A_.cend());
    return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
        return aSet.find(element) != aSet.end();
        });
}

This will probabilistically give O(m + n) results. (Hash tables are almost always O(1), but if an attacker can force many collisions in the table they could force O(n) behavior, leading to denial of service)

If you require deterministic results, and the order of the vectors does not matter, sorting one vector will work, which is only O(m lg m + m + n). That is:

  • Sort the first vector
  • For each element in the second vector, use binary search to determine if the element is in the first vector, and if so, increment your counter.

For example:

int vec_intersect(std::vector<int>& A_, const std::vector<int>& B_)
{
    std::sort(A_.begin(), A_.end());
    return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
        return std::binary_search(A_.cbegin(), A_.cend(), element);
        });
}

Just for giggles, here's an <algorithm>-ized version of your algorithm:

int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_)
{
    return std::count_if(B_.cbegin(), B_.cend(), [&](int element) {
        return std::find(A_.cbegin(), A_.cend(), element) != A_.cend();
        });
}

Upvotes: 3

Related Questions