Reputation: 1360
I want to sort A (possibly using B (or even with some reasonable preprocessing)). In general, I want to be able to sort many such arbitrary "As" (all subsets of B) for a given B. What is the best time in which this can be achieved?
Two naive solutions come to mind.
When A is small, the first is clearly better, when A is almost as large as B, the second is better. However, when, say B has something like 10^10 elements and A has 10^5, can we do better than either of these methods?
Upvotes: 2
Views: 296
Reputation: 1199
As an aside, it is unclear how your second solution achieves O(|B|). As you scan B, how do you "take elements that appear in A"? It sounds like there should be a log A factor somewhere.
The best I can contribute is an offline algorithm that sorts many A's in O(|B| + L) where L is the sum of the lengths of all A's. By "offline" I mean you can read all the A's in advance.
In practical terms, this is a memory hog. Your constraint of 10^10 64-bit number already assumes 80 GB available RAM. The hash table requires at least triple that amount. Storing all the A's in memory and the linked lists also require at least 3 L 64-bit values.
I am very skeptical that a practical online algorithm exists that makes good use of B to process one A at a time. Even assuming that you can look up positions in B in O(1) (e.g. with a hash table), that narrows the domain of your elements in A from 64 bits to 34 (10^10 rounded up). So counting sort is still unusable, since the final step requires scanning the domain of 10^10 values to sort a 10^5-element array. What the offline algorithm does is essentially pay off for the huge scan by sorting all the arrays at once.
Upvotes: 2