Algorithm: Find out which objects hold the subset of input array

Question

We have some objets (about 100,000 objects), each object has a property with some integers (range from 1 to 20,000, at most 20 elements, no duplicated elements.):

For example:

object_1: [1, 4]
object_2: [1, 3]
object_3: [100]

And the problem is, we input a array of integer (called A), find out which objects hold the subset of A.

For example:

when A = [1], the output should be []
when A = [1, 4], the output should be [object_1]
when A = [1, 3, 4], the output should be [object_1, object_2]

The problem can be described in python:

from typing import List


# problem description

class Object(object):

    def __init__(self, integers):
        self.integers = integers

    def size(self):
        return len(self.integers)


object_1 = Object([1, 4])
object_2 = Object([1, 3])
object_3 = Object([100])


def _find_subset_objects(integers):  # type: (List[int]) -> List[Object]
    raise NotImplementedError()


def test(find_subset_objects=_find_subset_objects):
    assert find_subset_objects([1]) == []
    assert find_subset_objects([1, 4]) == [object_1]
    assert find_subset_objects([1, 3, 4]) == [object_1, object_2]

Is there some algorithm or some data struct is aim to solve this kind of problem?

Shamis · Accepted Answer

Store the objects in an array. The indices will be 0 ... ~100K. Then create two helper arrays.

First one with the element counts for every object. I will call this array obj_total(This could be ommited by calling the object.size or something similar if you wish.)
Second one initialized with zeroes. I will call it current_object_count.

For every integer property p where 0 < p <= 20000, create a list of indices where index i in the list means that the element is contained in the i-th object.

It is getting messy and I'm getting lost in the names. Time for the example with the objects that you used in the question:

objects = [[1, 4], [1, 3], [100]]
obj_total = [2, 2, 1]
current_object_count = [0, 0, 0]
object_1_ref = [0, 1]
object_2_ref = [ ]
object_3_ref = [1]
object_4_ref = [0]
object_100_ref = [100]
object_refs = [object_1_ref ,object_2_ref , ... , object_100_ref]
#Note that you will want to do this references dynamically.
#I'm writing them out only for the sake of clarity.

Now we are given the input array, for example [1, 3, 4]. For every element i in the array, we look we look at the object_i_ref. We then use the indices in the reference array to increase the values in the current_object_count array.
Whenever you increase a value in the current_object_count[x], you also check against the obj_total[x] array. If the values match, the object in objects[x] is a subset of the input array and we can note it down.

When you finish with the input array you have all the results.

Algorithm: Find out which objects hold the subset of input array

Answers (1)

Related Questions