jonem
jonem

Reputation: 459

Efficient search algorithm for determining which sets are a subset of a larger set

Consider a collection of sets A1, A2,...,An. I want to determine the most efficient algorithm for finding which of these sets are subsets of a different set B.

For example, let the input of the algorithm be:

A1 = [1 2]
A2 = [2 3 4]
A3 = [1 3]
B = [1 2 3]

The algorithm should return:

output = [1 3] 

since A1 and A3 are subsets of B but A2 is not.

Upvotes: 1

Views: 1160

Answers (1)

BraveNewCurrency
BraveNewCurrency

Reputation: 13065

The simple O(N) answer: Start with both lists sorted. Pick the shorter list, and for each entry, check if it's present on the other side. You don't need to do a fancy search of the other list, just keep a pointer and increment until we find or pass our target number.

On the other hand, you can still speed up "O(N)" operations by making each step simpler for the computer. For example, if you only need a count of how many numbers are in common, you can compute that quite quickly with a bitmask

If you are comparing many lists to one special list, and most numbers will NOT be in common, you can create a Bloom Filter. This can tell you "number X is NOT in set Y" very quickly. It can't tell you if the number is present -- you'll have to double-check that manually. This is a speed win if you think MOST numbers are misses, and only a few will be hits.

Upvotes: 2

Related Questions