Imbue
Imbue

Reputation: 3907

Algorithm to find if two sets intersect

Let's say I have two arrays:

int ArrayA[] = {5, 17, 150, 230, 285};

int ArrayB[] = {7, 11, 57, 110, 230, 250};

Both arrays are sorted and can be any size. I am looking for an efficient algorithm to find if the arrays contain any duplicated elements between them. I just want a true/false answer, I don't care which element is shared or how many.

The naive solution is to loop through each item in ArrayA, and do a binary search for it in ArrayB. I believe this complexity is O(m * log n).

Because both arrays are sorted, it seems like there should be a more efficient algorithm.

I would also like a generic solution that doesn't assume that the arrays hold numbers (i.e. the solution should also work for strings). However, the comparison operators are well defined and both arrays are sorted from least to greatest.

Upvotes: 14

Views: 7803

Answers (7)

DavidLG
DavidLG

Reputation:

If one list is much much shorter than the other, binary search is the way to go. If the lists are of similar length and you're happy with O(m+n), a standard "merge" would work. There are fancier algorithms that are more flexible. One paper I've come across in my own searches is:

http://www.cs.uwaterloo.ca/~ajsaling/papers/paper-spire.pdf

Upvotes: 4

David Nehme
David Nehme

Reputation: 21597

Since someone wondered about stl. Out-of-the-box, the set_intersection algorithm would do more than you want: it would find all the common values.

    #include <vector>
    #include <algorithm>
    #include <iterator>
    using namespace std;
//    ...    
      int ArrayA[] = {5, 17, 150, 230, 285};
      int ArrayB[] = {7, 11, 57, 110, 230, 250};
      vector<int> intersection;
      ThrowWhenWritten output_iterator;
        set_intersection(ArrayA, ArrayA + sizeof(ArrayA)/sizeof(int),
                         ArrayB, ArrayB + sizeof(ArrayB)/sizeof(int),
                         back_insert_iterator<vector<int> >(intersection));

        return !intersection.empty();

this runs in O(m+n) time, but it requires storing all the duplicates and doesn't stop when it finds the first dup.

Now, modifying the code from the gnu implementation of the stl, we can get more precisely what you want.

 template<typename InputIterator1, typename InputIterator2>
 bool 
 has_intersection(InputIterator1 first1, InputIterator1 last1,
             InputIterator2 first2, InputIterator2 last2)
    {
       while (first1 != last1 && first2 != last2) 
       {
          if (*first1 < *first2)
             ++first1;
          else if (*first2 < *first1)
             ++first2;
          else
             return true;
       }
       return false;
}

Upvotes: 8

JaredPar
JaredPar

Reputation: 755457

If you are using C# 3.0 then why not take advantage of LINQ here?

ArrayA.Intersect(ArrayB).Any()

Not only is this generic (works for any comparable type) the implementation under the hood is pretty efficient (uses a hashing algorithm).

Upvotes: 1

James Curran
James Curran

Reputation: 103575

Glomek is on the right track, but kinda glossed over the algorithm.

Start by comparing ArrayA[0] to ArrayB[0]. if they are equal, you're done. If ArrayA[0] is less than ArrayB[0], then move to ArrayA[1]. If ArrayA[0] is more than ArrayB[0], then move to ArrayB[1].

Keeping stepping through till you reach the end of one array or find a match.

Upvotes: 0

Mr Fooz
Mr Fooz

Reputation: 111986

If the range of values is small, you could build a lookup table for one of them (time cost = O(N)) and then check if the bit is set from the other list (time cost = O(N)). If the range is large, you could do something similar with a hash table.

The mergesort trick from Glomek is an even better idea.

Upvotes: 0

aku
aku

Reputation: 124044

If you don't care about memory consumption, you can achieve good performance by using hash, i.e. create hash with keys = values of one array, and test values of second array against this hash

Upvotes: 3

Andru Luvisi
Andru Luvisi

Reputation: 25338

Pretend that you are doing a mergesort, but don't send the results anywhere. If you get to the end of either source, there is no intersection. Each time you compare the next element of each, if they are equal, there is an intersection.

For example:

counterA = 0;
counterB = 0;
for(;;) {
    if(counterA == ArrayA.length || counterB == ArrayB.length)
        return false;
    else if(ArrayA[counterA] == ArrayB[counterB])
        return true;
    else if(ArrayA[counterA] < ArrayB[counterB])
        counterA++;
    else if(ArrayA[counterA] > ArrayB[counterB])
        counterB++;
    else
        halt_and_catch_fire();
}

Upvotes: 40

Related Questions