Reputation: 93
Suppose we have m ordered sets and we want to find their intersection.
Which data structures should we use for the ordered sets and which algorithm would be the most efficient?
same question: Algorithm for N-way merge
It appears that the literature is huge. Thus a better question is this: Which good implementations are there?
Upvotes: 3
Views: 806
Reputation: 93
This is only a sketch: please help me improve it.
This solution will be based on using binary search to limit the search to n/2^i number of elements for each set, and I will use efficient data structures to remember those comparisons for the next number.
The first thing to note is that the balanced binary tree is good at performing binary search, only when the interval of the search closely match that of a (sub)tree.
The other 2 structures that accept binary search is the array and the skip list. The array is inefficient for insertions and deletion, so the skip list seems the best choice.
We will need m arrays of size 64 that will contain the elements of each set per array that were compared in the binary search, inserted in the order of execution of the comparison.
We will also need a double linked list in which all the elements from all the sets that were used in the binary search will be inserted. Using a skip list here minimizes even more the number of comparisons needed.
The basic idea is this.
Upvotes: 0
Reputation: 13223
You can create binary tree with link to parent node and implement classic algorithm of intersection/union:
iterA
to left-most (smallest) node of the tree (i.e. descend over left-most branches to leaf).iterB
to first (smallest) node of the ordered set (if it is implemented with ordered array or to the left-most node if as tree).iterA
and iterB
iterA
iterA
and itemB
iterB
Advance of binary tree iterator by:
Update:
If you know that your ordered set (walked by iterB
) is much smaller than the tree you can use a bit more sophisticated algorithm for intersection:
iterB
to beginning of ordered set (lower value).iterA
to the node which is minimum upper bound of value iterB
.iterA
and iterB
itemB
to the next value.iterA
to the minimum upper bound of value at itemB
starting from current itemA
.itemB
pass all items of ordered set.Where advance to the minimum upper bound from specific node is:
The main idea of searching bound is narrowing upper and minimum bound ("-" is ignored nodes, "..." is new search range):
for B < X < A
U
/ \-
L
-/ \...
for A < X < B
L
-/ \
U
.../ \-
Upvotes: 1