Looking up all keys in a hashmap like data structure that are a subset of a query

Question

The Question

Let's say I have a HashMap:

{
  "foo" => (A,B),
  "bar" => (B,C),
  "baz" => (A,D)
}

And I transform it into some other data-structure, something similar to:

{
  (A,B) => "foo",
  (B,C) => "bar",
  (A,D) => "baz",
  (D,E) => "biz"
}

Given the 'key' (or query I suppose): (A,B,C)

I'd like to lookup all values where their key is a subset of the query set: ("foo","bar")

I'm trying to find an efficient algorithm/data-structure that will do that lookup efficiently, although I don't think there's a O(1) solution.

One possible solution

One idea I had was to break up the transformed map to look like this:

{
  A => ("foo","baz"),
  B => ("foo","bar")
  C => ("bar"),
  D => ("baz","biz"),
  E => ("biz")
}

Then lookup each element not in the query set: (D => ("baz", "biz"), E => ("biz"))

Union them: ("baz", "biz")

And take the difference from the set of all possible results: ("foo", "bar", "baz", "biz") - ("baz", "biz") => ("foo", "bar")

While this works, it's a lot of steps with an num_of_set_elements number of unions, and on large sets with a small query, a possibly huge number of lookups.

So, does anyone have a better solution?

jferard · Accepted Answer

The basic solution does not need to build a new structure: iterate through the pairs (key, value) and if the value is a subset of your target, yield the key. The expected time is O(n) (n = number of keys) if you can represent the sets with a 64 bits integer (less than 64 atomic values A, B, C, ...). See https://cs.calvin.edu/activities/books/c++/ds/2e/WebItems/Chapter09/Bitsets.pdf for example (S contains A if A & S == S), and more if you have to check the potential subsets another way.

But if you have enough room and a long time for preprocessing data, there's a (crazy?) amortized O(1) time solution for lookup. If S is the set of all possible atomic values (A, B, ...), you can build all the possible supersets of every value ((A, B), (A, C), ...) that are subsets of S. Now you can build a new map like this (pseudocode):

for each pair (key, value):
    for every superset s of value:
        add key to m[s]

The lookup will be in constant amortized time O(1). The map will have roughly 2^|S| keys. The size of the values will depend on the size of the keys, e.g. the biggest key, S, will contain every key of the initial map. Preprocessing will be something like O(n*2^|S|).

I'd rather go for O(n) though.

Looking up all keys in a hashmap like data structure that are a subset of a query

The Question

One possible solution

Answers (1)

Related Questions