Reputation: 5089
I have 2 arrays (A
and B
) that contain similar data with some differences. I would like to return an array of objects that are only in A
and another array of objects that are only in B
. So far I have been thinking:
What are my other options? Any languages/solutions are fair game.
Upvotes: 2
Views: 933
Reputation: 73181
I'd stuff array A's elements into a hash table, then iterate over array B doing lookups in the hash table to efficiently determine which elements in B are also in A. Then do the same with B's elements in a hash table while iterating over array A. That would be O(N) throughout.
Upvotes: 2
Reputation: 17939
Try using Sets. They usually have a difference() method (or something like this) that returns exactly what u want. Simple as that. Once it's language-agnostic, how you create sets or transform the difference into an array is done using general methods.
Set A = createSetA();
Set B = createSetB();
Array onlyAElements = transformToArray(A.difference(B));
Array onlyBElements = transformToArray(B.difference(A));
Alternatively, you could sort both arrays, and get both difference arrays at the same time. Something like
int aIndex = 0;
int bIndex = 0;
Array aOnly = new Array();
Array bOnly = new Array();
while (aIndex != a.length || bIndex != b.length)
{
if (A[aIndex] == B[bIndex]
{
aIndex++;
bIndex++;
}
else if (A[aIndex] > B[bIndex])
{
aOnly.add(A[aIndex]);
aIndex++;
}
else
{
bOnly.add(B[bIndex]);
bIndex++;
}
}
You should keep in mind there's some mistakes on getting out of bounds. But the code is just to get the main idea.
Upvotes: 0
Reputation: 12803
A lot of this is going to depend on what type of data you have. You mention sorting, so I take it the elements are comparable. With sets of size m
and n
, this will take O(m lg m + n lg n)
to sort, and that will dominate. (Asymptotically, it won't matter if you do binary search or walk both lists. Walking both lists should be O( m + n)
.) Of course, if you are using data with a better sort algorithm available, such as integers with radix-sort, you should be able to get down to O( m + n)
.
Using sets (as others are suggesting) implicitly suggests using hashing, which will definitely make your problem easier. If you hash all the elements in A ( O(m)
) and store all the hashes in a hash set in memory, then hash B ( O(n)
) and detect where collisions may occur in the hash set. This becomes a matter for optimization: you have to evaluate a classic speed-memory trade-off. The larger your hash set, the quicker the collision-checks will be. This will run in O( m + n )
.
It's worth noting that you can prove that any algorithm that does what you ask will run in at least m + n
time, since all the inputs need to be looked at.
Upvotes: 1
Reputation: 28122
I don't have an implementation or algorithm beyond what's already been said but I thought I would leave this solution in c#/linq for anyone who might find this question and wants to do this:
var a = new int[] { 1, 2, 3, 6, 7, 8, 9, 10 };
var b = new int[] { 1, 2, 3, 4, 5, 6, 7 };
int[] addedToA = a.Except(b);
int[] missingFromA = b.Except(a);
foreach (var i in addedToA)
{
Console.Write("{0} ", i);
}
Console.WriteLine();
foreach (var i in missingFromA)
{
Console.Write("{0} ", i);
}
This prints:
8 9 10
4 5
Upvotes: 0
Reputation: 347426
You could sort both arrays then do a linear scan through both arrays at the same time. This would be an O(nlogn) algorithm for the sort and O(n) for the scan / building of the new arrays.
Upvotes: 6