Reputation: 24547
Is there an algorithm similar to the bloom filter, that allows you to:
In the special case where one set only has a single same and you don't compress it, this problem reduces to probablistic set membership, for which one can consider the Bloom filter.
Upvotes: 1
Views: 47
Reputation: 24547
The Apache datasketches library has interesting functionality:
https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html
"Theta sketches" support intersection, and count operators.
Upvotes: 0
Reputation: 24547
OK, so here's a terrible answer just to get the discussion going:
You could compress both sets as Bloom filters. Choose a very large number of random samples from the larger set both sets are drawn from.
For each sample you pick, look it up in the bloom filters.
If both indicate "probable member", then you have found a "probable intersection" and the sets are "probably overlapping". Otherwise you declare that the sets are "probably not overlapping".
Upvotes: 0