Reputation: 32650
Given are two (either ordered or not ordered) sets A and B of elements of the same type, where an arbitrary number of elements exists in A, B or both.
I would like to determine which elements in A are not contained in B as well as which elements of B are not contained in A.
This can easily be done via
var dA = A.Except(B);
var dB = B.Except(A);
My question: Is this the most efficient algorithm for this task?
Since the above iterates both sets, it appears we might be able to re-use some information from the first iteration to reduce the effort spent on the second.
Upvotes: 1
Views: 92
Reputation: 95308
Let n
be the size of set A
and m
be the size of set B
. If we assume that A
and B
are implemented as hash tables, there's a simple O(min(n,m))
expected time algorithm to find A \ B
and B \ A
.
W.l.o.g let n < m
(otherwise swap the sets).
A
. For every element, check whether it is in B
as well. If yes, remove it from B
. If no, add it to dA
.The result will be in dA
and B
.
If you don't want to destroy B
, you can create a copy of it beforehand, which should be very fast when implemented as a simple memcpy
.
You could instead use a persistent data structure to represent B
, but this adds a considerable cost and is unlikely to be helpful, unless your set sizes are very unbalanced.
Upvotes: 2
Reputation: 80197
If you can iterate sets in order, then it is possible to use algorithm like merge-phase of mergesort:
a = A.First
b = B.First
while a <> Nil and b <> Nil do
if a < b
dA.Add(a)
a = A.Next
else if a > b
dB.Add(b)
b = B.Next
else
a = A.Next
b = B.Next
endwhile
if a <> Nil
dA.Add(rest of A)
if b <> Nil
dB.Add(rest of B)
Upvotes: 1