Computing efficiently intersection of n sets

Question

I have n sets identified by setId and each one could contain an arbitrary number of elements that is a pair (elementId, priority).

My algorithm should take in input two setId, and give in output a set containing the first m elements which are in the intersection of the two input set and have the highest priority (sum of priorities).

Example:

n=3, m=1

Set1: { (1, 1), (12, 2) }
Set2: { (1, 4), (23, 6), (33, 22) }
Set3: { (33, 1), (1, 16 }


Input: Set2, Set3
Output: { (33, 23) }

My question is: assuming that I have infinite space what is the best data structures I can use in order to optimize performance?

Of course precomputing all possible intersection isn't a valid answer.

Edit:

Realistic numbers:

n, sets number, is ~ 10^6
average cardinality of sets is ~ 5*10^3.

Mark Ransom · Accepted Answer

Take one of the sets and convert it to a hash map. Iterate the other set, and for each member try to look up the element in the hash map. If you find it, add the result to a heap; if the size of the heap grows to one greater than the number of elements you wish to keep, throw away the lowest item in the heap.

Computing efficiently intersection of n sets

Answers (1)

Related Questions