Reputation: 502
Let's say I have a HashMap:
{
"foo" => (A,B),
"bar" => (B,C),
"baz" => (A,D)
}
And I transform it into some other data-structure, something similar to:
{
(A,B) => "foo",
(B,C) => "bar",
(A,D) => "baz",
(D,E) => "biz"
}
Given the 'key' (or query I suppose): (A,B,C)
I'd like to lookup all values where their key is a subset of the query set: ("foo","bar")
I'm trying to find an efficient algorithm/data-structure that will do that lookup efficiently, although I don't think there's a O(1) solution.
One idea I had was to break up the transformed map to look like this:
{
A => ("foo","baz"),
B => ("foo","bar")
C => ("bar"),
D => ("baz","biz"),
E => ("biz")
}
Then lookup each element not in the query set: (D => ("baz", "biz"), E => ("biz"))
Union them: ("baz", "biz")
And take the difference from the set of all possible results: ("foo", "bar", "baz", "biz") - ("baz", "biz") => ("foo", "bar")
While this works, it's a lot of steps with an num_of_set_elements number of unions, and on large sets with a small query, a possibly huge number of lookups.
So, does anyone have a better solution?
Upvotes: 2
Views: 160
Reputation: 8180
The basic solution does not need to build a new structure: iterate through the pairs (key, value) and if the value is a subset of your target, yield the key. The expected time is O(n) (n = number of keys) if you can represent the sets with a 64 bits integer (less than 64 atomic values A
, B
, C
, ...). See https://cs.calvin.edu/activities/books/c++/ds/2e/WebItems/Chapter09/Bitsets.pdf for example (S contains A if A & S == S), and more if you have to check the potential subsets another way.
But if you have enough room and a long time for preprocessing data, there's a (crazy?) amortized O(1) time solution for lookup.
If S
is the set of all possible atomic values (A
, B
, ...), you can build all the possible supersets of every value ((A, B)
, (A, C)
, ...) that are subsets of S
.
Now you can build a new map like this (pseudocode):
for each pair (key, value):
for every superset s of value:
add key to m[s]
The lookup will be in constant amortized time O(1). The map will have roughly 2^|S|
keys. The size of the values will depend on the size of the keys, e.g. the biggest key, S
, will contain every key of the initial map. Preprocessing will be something like O(n*2^|S|).
I'd rather go for O(n) though.
Upvotes: 2