Reputation: 35772
Consider that I have a map of sets of values to values, in Java the type of this map would be:
Map<Set<Object>, Object> setToObjMap;
Given a new set of objects set
, I wish to find all values in the setToObjMap where the associated key is a subset of a "search set".
So, for example, if my map was:
["telephone", "hat"] -> "book"
["laugh", "fry", "mouse"] -> "house"
["dog", "cat"] -> "monster"
Then, given the search set ["telephone", "hat", "book", "dog", "cat"]
I would retrieve the values "book" and "monster".
In practice there may be tens of thousands of entries in the setToObjectMap
, with tens of thousands of possible values in the sets. The search set will typically have around 10 elements.
I'm hoping there is an efficient way to do this that doesn't require iterating through all keys in the map. Can anyone offer any suggestions?
Upvotes: 3
Views: 751
Reputation: 49816
If the members of your sets are subjet to some ordering, then you could hold them in a tree structure, and attach the key-value mappings at the leaves. Then, when you follow the path of a subset down the tree, all leaves under that subtree will be sets containing your subset.
Upvotes: 0
Reputation: 363527
Iterating over the map is one option. This takes O(n × m) time, where n is the number of entries in the map and m is the number of items in the query set; the m factor comes about because of the subset check.
The other option is generating all subsets of the set to search for and looking those up in the map. That takes O(2^m) time. That may be faster than the first option if 2^m is small compared to n (so m should be very small). In your example use case, 2^m = 2^10 = 1024, which is less than tens of thousands.
If the query set size is known to vary, you can even use a hybrid strategy: compute the number 2^m and check whether it is smaller than n, then select the best of these two options depending on the result of the check.
Upvotes: 1
Reputation: 8468
You can create a lookup data structure
Map<String,List<Finder>>
With Finder
having an int count
and max
, and a res
word. Take note that the list is there to take care of the case where many sets in setToObjMap
can share the same word, which is not in your examples.
"telephone" -> [{res:"book",count=0,max=2}]
"hat" -> same object as above
"laugh" -> [{res:"house",count=0,max=3}]
...
This lookup collection is quick to build and even quicker to flush after a lookup.
The lookup algorithm iterates through set
, for each word, and each Finder for this word, it increases the count
variable. Second pass, take all values of the lookup map, if count==max
, put res
in the result.
Init algorithm:
for Entry e in setToObjMap
Finder f = new Finder(e.value, 0, e.key.size) // res, count, max
for String word in e.key
lookup.get(word).add(f)
Lookup algorithm:
for String word in set
for Finder f in lookup.get(word)
f.count ++
for Finder f in lookup.values()
if (f.count==f.max)
res.add(f.res)
Reset algorithm:
for Finder f in lookup.values()
f.count = 0
As for the complexity, if n is the number of elements in set
and m the number of values in setToObjMap
, the complexity will be O(n+m)
Upvotes: 3
Reputation: 28951
Yet another option is to build an index from single element to the key sets:
"hat" -> ["telephone", "hat"]
"telephone" -> ["telephone", "hat"]
"laugh"->["laugh", "fry", "mouse"]
"fry"->["laugh", "fry", "mouse"]
"mouse"->["laugh", "fry", "mouse"]
"dog" -> ["dog", "cat"]
"cat" -> ["dog", "cat"]
It will allow quickly query key sets by input.
Upvotes: 1
Reputation: 183858
If the sets in question are small, and the map is large, the best way would be to generate all subsets of the set and look those up in the map.
If your set has k
elements and there are n
associations in the map, that would take 2^k
lookups vs. n
subset checks the other way round. You see that for n = 1000
and k = 20
that would be a bad idea, but for n = 100000
and k = 10
, it would be a win.
Upvotes: 1