Reputation: 459
Consider a collection of sets A1
, A2
,...,An
. I want to determine the most efficient algorithm for finding which of these sets are subsets of a different set B
.
For example, let the input of the algorithm be:
A1 = [1 2]
A2 = [2 3 4]
A3 = [1 3]
B = [1 2 3]
The algorithm should return:
output = [1 3]
since A1
and A3
are subsets of B
but A2
is not.
Upvotes: 1
Views: 1160
Reputation: 13065
The simple O(N) answer: Start with both lists sorted. Pick the shorter list, and for each entry, check if it's present on the other side. You don't need to do a fancy search of the other list, just keep a pointer and increment until we find or pass our target number.
On the other hand, you can still speed up "O(N)" operations by making each step simpler for the computer. For example, if you only need a count of how many numbers are in common, you can compute that quite quickly with a bitmask
If you are comparing many lists to one special list, and most numbers will NOT be in common, you can create a Bloom Filter. This can tell you "number X is NOT in set Y" very quickly. It can't tell you if the number is present -- you'll have to double-check that manually. This is a speed win if you think MOST numbers are misses, and only a few will be hits.
Upvotes: 2