Joe Cannatti
Joe Cannatti

Reputation: 5089

Comparison Algorithm

I have 2 arrays (A and B) that contain similar data with some differences. I would like to return an array of objects that are only in A and another array of objects that are only in B. So far I have been thinking:

  1. Brute force with some optimizations (this is trivial)
  2. Sort the arrays and use a binary search.

What are my other options? Any languages/solutions are fair game.

Upvotes: 2

Views: 933

Answers (5)

Jeremy Friesner
Jeremy Friesner

Reputation: 73181

I'd stuff array A's elements into a hash table, then iterate over array B doing lookups in the hash table to efficiently determine which elements in B are also in A. Then do the same with B's elements in a hash table while iterating over array A. That would be O(N) throughout.

Upvotes: 2

Samuel Carrijo
Samuel Carrijo

Reputation: 17939

Try using Sets. They usually have a difference() method (or something like this) that returns exactly what u want. Simple as that. Once it's language-agnostic, how you create sets or transform the difference into an array is done using general methods.

Set A = createSetA();
Set B = createSetB();

Array onlyAElements = transformToArray(A.difference(B));
Array onlyBElements = transformToArray(B.difference(A));

Alternatively, you could sort both arrays, and get both difference arrays at the same time. Something like

int aIndex = 0;
int bIndex = 0;

Array aOnly = new Array();
Array bOnly = new Array();

while (aIndex != a.length || bIndex != b.length)
{
   if (A[aIndex] == B[bIndex]
   {
       aIndex++;
       bIndex++;
   }
   else if (A[aIndex] > B[bIndex])
   {
       aOnly.add(A[aIndex]);
       aIndex++;
   }
   else 
   {
       bOnly.add(B[bIndex]);
       bIndex++;
   }
} 

You should keep in mind there's some mistakes on getting out of bounds. But the code is just to get the main idea.

Upvotes: 0

David Berger
David Berger

Reputation: 12803

A lot of this is going to depend on what type of data you have. You mention sorting, so I take it the elements are comparable. With sets of size m and n, this will take O(m lg m + n lg n) to sort, and that will dominate. (Asymptotically, it won't matter if you do binary search or walk both lists. Walking both lists should be O( m + n).) Of course, if you are using data with a better sort algorithm available, such as integers with radix-sort, you should be able to get down to O( m + n).

Using sets (as others are suggesting) implicitly suggests using hashing, which will definitely make your problem easier. If you hash all the elements in A ( O(m) ) and store all the hashes in a hash set in memory, then hash B ( O(n) ) and detect where collisions may occur in the hash set. This becomes a matter for optimization: you have to evaluate a classic speed-memory trade-off. The larger your hash set, the quicker the collision-checks will be. This will run in O( m + n ).

It's worth noting that you can prove that any algorithm that does what you ask will run in at least m + n time, since all the inputs need to be looked at.

Upvotes: 1

Robert Cartaino
Robert Cartaino

Reputation: 28122

I don't have an implementation or algorithm beyond what's already been said but I thought I would leave this solution in c#/linq for anyone who might find this question and wants to do this:

    var a = new int[] { 1, 2, 3, 6, 7, 8, 9, 10 };
    var b = new int[] { 1, 2, 3, 4, 5, 6, 7 };

    int[] addedToA = a.Except(b);
    int[] missingFromA = b.Except(a);

    foreach (var i in addedToA)
    {
        Console.Write("{0} ", i);
    }
    Console.WriteLine();
    foreach (var i in missingFromA)
    {
        Console.Write("{0} ", i);
    }

This prints:

8 9 10
4 5

Upvotes: 0

Brian R. Bondy
Brian R. Bondy

Reputation: 347426

You could sort both arrays then do a linear scan through both arrays at the same time. This would be an O(nlogn) algorithm for the sort and O(n) for the scan / building of the new arrays.

Upvotes: 6

Related Questions