Reputation: 3
I'm looking for an algorithm that returns (the length of) the intersection of two given N^4 lists / sets (more precisely in [|0;7|]^4). I'm not expecting an answer in a specific programming language (I'm doing it in OCaml but as this language isn't very popular I don't expect answers to be written in Caml...). Note that solutions involving big pre-computations (for example to completly change the structure of sets) aren't a problem in my case.
I tried to consider N^4 sets as classical integer sets to then intersect them using buit-in set intersections (but this was way too slow for my purpose) and I also tried to consider them as N^2 sets to apply the Lebesgue fast N^2 intersect (this could have worked but it turns out points don't rearrange very randomly over the plane thus making this algorithm quite inefficient).
Upvotes: 0
Views: 93
Reputation: 7740
You should get a free speedup by converting your 4-tuples to integers, so I would keep doing that. Just a simple conversion to octal, each component a separate digit, should do and is fairly fast since its just bit shifts:
number = tuple[0] | tuple[1] << 3 | tuple[2] << 6 | tuple[3] << 9
Regarding the set intersection, in most cases you aren't going to be able to beat the time O(n)
in the smaller of the two sets. If you know anything about the distribution of the sets, or how unique they're expected to be, we may be able to come up with some heuristics to speed up the intersection in practice.
Update
Here's an example of how you could speedup the computation with knowledge of the distribution. Lets say we know that the vast majority of elements occur very infrequently between the sets you intend to preprocess. What you could do is break the sets into two separate data structures, one that maps a pair of sets to the uncommon elements they share, and a second one that holds the remaining (common) elements for each set. The first would be an easy O(1)
lookup, and could be bounded to some small manageable size. The second would require O(n)
naive intersection as before, but the sets would be drastically smaller because we said the vast majority of elements are uncommon.
Here is an implementation in python:
from itertools import combinations
from collections import Counter, defaultdict
from random import sample
N = 10 ** 7
K = 10 ** 5
S = 100
input_sets = [set(sample(range(N), k=K)) for _ in range(S)]
# get frequency of each element
counts = Counter()
for input_set in input_sets:
counts.update(input_set)
# mark all elements occurring in less than 5 sets as uncommon
uncommon = {val for val, count in counts.items() if count < 5}
# break each input_set into common / uncommon parts
common_sets = [input_set - uncommon for input_set in input_sets]
uncommon_sets = [input_set & uncommon for input_set in input_sets]
# create reverse lookup table from values to sets containing them
uncommon_lookup = defaultdict(list)
for idx, uncommon_set in enumerate(uncommon_sets):
for val in uncommon_set:
uncommon_lookup[val].append(idx)
# create mapping of input set pairs to shared uncommon elements
uncommon_matrix = defaultdict(set)
for val, indices in uncommon_lookup.items():
for i, j in combinations(indices, 2):
uncommon_matrix[i, j].add(val)
When running a performance benchmark, we can see that it runs about 30x times faster.
from timeit import timeit
test_pairs = [sample(range(S), k=2) for _ in range(100)]
def naive_benchmark():
for i, j in test_pairs:
input_sets[i] & input_sets[j]
def fast_benchmark():
for i, j in test_pairs:
(common_sets[i] & common_sets[j]) | uncommon_matrix[i, j]
print(timeit("naive_benchmark()", setup="from __main__ import naive_benchmark", number=10) / 10)
print(timeit("fast_benchmark()", setup="from __main__ import fast_benchmark", number=10) / 10)
Output:
0.3008148681998136
0.009065380399988499
Of course, if your sets are all very similar and comprised of mostly common elements, we could lean the other way and focus on optimizing that case instead.
Upvotes: 2
Reputation: 18892
It is not clear what performance target you are aiming for. Normal set intersection with at most 4096 elements should be not that slow at least for an intersection on generic sets. Typically, the worst case scenario (intersecting the full set with itself) takes around 70µs with OCaml standard set on my computer whereas the best case scenario (intersecting two empty sets) takes 6ns. In particular, computing only the length in O(n) and avoiding to build the result set doesn't seem to improve the performance in the worst case scenario.
Thus, we are looking at optimizing constants by fitting the algorithm to the data set. Since, your sets have at most 4096 elements, they can be represented as strings of length 4096/8 = 512.
With this representation, intersecting two sets is a matter of taking the logical and
of the two strings. We can optimize further by computing the length on the fly:
let inter x y =
let r = ref 0 in
for i = 0 to set_size - 1 do
r := !r + popcount (Char.code x.[i] land Char.code y.[i])
done;
!r
This is enough to decrease the computation time to 200 ns with an OCaml implementation of popcount (which counts the number of 1 bit in the character).
Going further is possible, but at that point we need to either exploit more structure of the data set or we need to fit the algorithm to the hardware by going down to C in order to use the hardware popcount
instruction and vectorial SIMD instructions to compute the logical and on larger batch of bytes.
Upvotes: 1