Reputation: 5366
Is there an algorithm (preferably constant time) to check if set A is a subset of set B?
Creating the data structures to facilitate this problem does not count against the runtime.
Upvotes: 6
Views: 6571
Reputation: 2778
If you have a list of the least common letters and pairs of letters in your string set, you can store your sets sorted with their least common letters and letter pairs and maximize your chances of tossing out negative matches as quickly as possible. It's not clear to me how well this would combine with a bloom filter, Probably a hash table will do since there aren't very many digrams and letters.
If you had some information about the maximum size of subsets or even a common size you could similarly preproccess data by putting all of the subsets of a given size into a bloom filter as mentioned.
You could also do a combination of both of these.
Upvotes: 0
Reputation: 7807
You might go for bloom filter (http://en.wikipedia.org/wiki/Bloom_filter ). However there might be false positives, which can be addressed by the method mentioned by Keith above (but note that the worst case complexity of hashing is NOT O(n), but you can do O(nlogn).
Upvotes: 0
Reputation: 23265
Well, you're going to have to look at each element of A
, so it must be at least linear time in the size of A
.
An O(A+B)
algorithm is easy using hashtables (store elements of B
in a hashtable, then look up each element of A
). I don't think you can do any better unless you know some advance structure for B
. For instance, if B
is stored in sorted order, you can do O(A log B)
using binary search.
Upvotes: 1