Reputation: 520
We have some objets (about 100,000 objects), each object has a property with some integers (range from 1 to 20,000, at most 20 elements, no duplicated elements.):
For example:
[1, 4]
[1, 3]
[100]
And the problem is, we input a array of integer (called A), find out which objects hold the subset of A.
For example:
A = [1]
, the output should be []
A = [1, 4]
, the output should be [object_1]
A = [1, 3, 4]
, the output should be [object_1, object_2]
The problem can be described in python:
from typing import List
# problem description
class Object(object):
def __init__(self, integers):
self.integers = integers
def size(self):
return len(self.integers)
object_1 = Object([1, 4])
object_2 = Object([1, 3])
object_3 = Object([100])
def _find_subset_objects(integers): # type: (List[int]) -> List[Object]
raise NotImplementedError()
def test(find_subset_objects=_find_subset_objects):
assert find_subset_objects([1]) == []
assert find_subset_objects([1, 4]) == [object_1]
assert find_subset_objects([1, 3, 4]) == [object_1, object_2]
Is there some algorithm or some data struct is aim to solve this kind of problem?
Upvotes: 1
Views: 43
Reputation: 2604
Store the objects in an array. The indices will be 0 ... ~100K. Then create two helper arrays.
For every integer property p where 0 < p <= 20000, create a list of indices where index i in the list means that the element is contained in the i-th object.
It is getting messy and I'm getting lost in the names. Time for the example with the objects that you used in the question:
objects = [[1, 4], [1, 3], [100]]
obj_total = [2, 2, 1]
current_object_count = [0, 0, 0]
object_1_ref = [0, 1]
object_2_ref = [ ]
object_3_ref = [1]
object_4_ref = [0]
object_100_ref = [100]
object_refs = [object_1_ref ,object_2_ref , ... , object_100_ref]
#Note that you will want to do this references dynamically.
#I'm writing them out only for the sake of clarity.
Now we are given the input array, for example [1, 3, 4]. For every element i in the array, we look we look at the object_i_ref. We then use the indices in the reference array to increase the values in the current_object_count array.
Whenever you increase a value in the current_object_count[x]
, you also check against the obj_total[x]
array. If the values match, the object in objects[x]
is a subset of the input array and we can note it down.
When you finish with the input array you have all the results.
Upvotes: 1