sanity
sanity

Reputation: 35772

How can I efficiently find subsets of a set in a map?

Consider that I have a map of sets of values to values, in Java the type of this map would be:

Map<Set<Object>, Object> setToObjMap;

Given a new set of objects set, I wish to find all values in the setToObjMap where the associated key is a subset of a "search set".

So, for example, if my map was:

["telephone", "hat"] -> "book"
["laugh", "fry", "mouse"] -> "house"
["dog", "cat"] -> "monster"

Then, given the search set ["telephone", "hat", "book", "dog", "cat"] I would retrieve the values "book" and "monster".

In practice there may be tens of thousands of entries in the setToObjectMap, with tens of thousands of possible values in the sets. The search set will typically have around 10 elements.

I'm hoping there is an efficient way to do this that doesn't require iterating through all keys in the map. Can anyone offer any suggestions?

Upvotes: 3

Views: 751

Answers (5)

Marcin
Marcin

Reputation: 49816

If the members of your sets are subjet to some ordering, then you could hold them in a tree structure, and attach the key-value mappings at the leaves. Then, when you follow the path of a subset down the tree, all leaves under that subtree will be sets containing your subset.

Upvotes: 0

Fred Foo
Fred Foo

Reputation: 363527

Iterating over the map is one option. This takes O(n × m) time, where n is the number of entries in the map and m is the number of items in the query set; the m factor comes about because of the subset check.

The other option is generating all subsets of the set to search for and looking those up in the map. That takes O(2^m) time. That may be faster than the first option if 2^m is small compared to n (so m should be very small). In your example use case, 2^m = 2^10 = 1024, which is less than tens of thousands.

If the query set size is known to vary, you can even use a hybrid strategy: compute the number 2^m and check whether it is smaller than n, then select the best of these two options depending on the result of the check.

Upvotes: 1

solendil
solendil

Reputation: 8468

You can create a lookup data structure

Map<String,List<Finder>>

With Finder having an int count and max, and a res word. Take note that the list is there to take care of the case where many sets in setToObjMap can share the same word, which is not in your examples.

"telephone" -> [{res:"book",count=0,max=2}]
"hat" -> same object as above
"laugh" -> [{res:"house",count=0,max=3}]
...

This lookup collection is quick to build and even quicker to flush after a lookup.

The lookup algorithm iterates through set, for each word, and each Finder for this word, it increases the count variable. Second pass, take all values of the lookup map, if count==max, put res in the result.

Init algorithm:

for Entry e in setToObjMap
  Finder f = new Finder(e.value, 0, e.key.size) // res, count, max
  for String word in e.key
    lookup.get(word).add(f)

Lookup algorithm:

for String word in set
  for Finder f in lookup.get(word)
    f.count ++
for Finder f in lookup.values()
  if (f.count==f.max)
    res.add(f.res)

Reset algorithm:

for Finder f in lookup.values()
    f.count = 0

As for the complexity, if n is the number of elements in set and m the number of values in setToObjMap, the complexity will be O(n+m)

Upvotes: 3

kan
kan

Reputation: 28951

Yet another option is to build an index from single element to the key sets:

"hat" -> ["telephone", "hat"]
"telephone" -> ["telephone", "hat"]
"laugh"->["laugh", "fry", "mouse"]
"fry"->["laugh", "fry", "mouse"]
"mouse"->["laugh", "fry", "mouse"]
"dog" -> ["dog", "cat"]
"cat" -> ["dog", "cat"]

It will allow quickly query key sets by input.

Upvotes: 1

Daniel Fischer
Daniel Fischer

Reputation: 183858

If the sets in question are small, and the map is large, the best way would be to generate all subsets of the set and look those up in the map.

If your set has k elements and there are n associations in the map, that would take 2^k lookups vs. n subset checks the other way round. You see that for n = 1000 and k = 20 that would be a bad idea, but for n = 100000 and k = 10, it would be a win.

Upvotes: 1

Related Questions