Ian Lee
Ian Lee

Reputation: 31

Finding the unique subset of elements in list of sets

I am trying to find the unique determining subset(s) of a set in a list of sets, i.e. a subset E of set A such that A is the only set containing E (I'm not sure mathematically how to call it). For example, for the following list of sets:

set A: {2,3,5}
set B: {2,3}
set C: {2,3,7}
set D: {3,7}
set E: {2,11,13}
set F: {2}

The unique subset are:

set A: {5}
set B: {}
set C: {2,7}
set D: {}
set E: {{11},{13},{11,13}}
set F: {}

And the result shows relationships like given the set contains 2 and 7 it must be set C, or we can't determine a unique set if we only has element 3. Note that the elements do not necessarily need to be numbers, they can be any objects.

Upvotes: 1

Views: 1237

Answers (3)

Alessandro
Alessandro

Reputation: 11

I recently wrote a Python package that is intended to solve this problem efficiently, with the only difference being that I'm interested in finding the smallest unique determining subsets: https://github.com/alussana/TrieSUS

During my research I was quite surprised not to find a name for this algorithmic question, and I could only find brute-force approaches that involve enumerating and comparing powersets to find the solution(s) for each set - which is very inefficient, and especially slow when considering sets for which a solution doesn't exist.

My algorithm uses a trie data structure and a series of linear time operations to first greatly reduce the problem size, and to then transform it into the equivalent of a set cover problem, the solution of which is extracted using OR-Tools's constrained programming solver. More information about the algorithm and its performance can be found in the repository.

Upvotes: 1

Prathibha Chiranthana
Prathibha Chiranthana

Reputation: 822

You can use this method for your implementation. I write some method for checking is a subset String Sets.

public boolean isASubset( Set<String> parent,Set<String> child) {

    boolean[] result = new boolean[1];
    child.forEach((String s) -> {
                if (!parent.contains(s)) {
                    result[0] = true;
                }
            }

    );

    return result[0];
}

Upvotes: -1

Louis Maddox
Louis Maddox

Reputation: 5576

Not sure about the name for what you're doing here but mathematically I'd approach it by comparing the difference of powersets, but since you're only interested in subsets then discard the full set from the powerset (a powerset is all possible subsets, including the empty set and the full set).

The problem is to find unique subsets of a set against the powersets of the other sets. In Python, this is done by repeatedly checking for the tuple representing a particular subset [of the given set] in a list of tuples representing the powerset of one of the other sets, across all n - 1 (so in your example, 5) powersets.

Here's an implementation in Python, reading from a text file containing your input:

from itertools import chain, combinations as comb
import re

def ProcessSet(l):
    """Turn a line [read from text file] into a tuple of (label, values)."""
    label = l[l.find(':')-1]
    vals = re.compile('{(.+)}').findall(l.rstrip())[0].split(',')
    return label, vals

def GetSubsets(s):
    """
    Get all subsets of a given set (including the empty set).
    """
    return list(chain(*map(lambda x: comb(s, x), range(0, len(s)))))

def GetPowerset(s):
    """
    Get the powerset of a given set (all subsets incl. empty and full set).
    """
    return list(chain(*map(lambda x: comb(s, x), range(0, len(s)+1))))

# read the text lines into a list
with open("set_list.txt", "r") as f:
    sets = [ProcessSet(l) for l in f.readlines()]

all_subsets = [GetSubsets(s[1]) for s in sets]
powersets  = [GetPowerset(s[1]) for s in sets]

for i in range(0, len(sets)):
    # declare label (l) and subsets (ss) for current loop iteration
    l, ss = sets[i][0], all_subsets[i]
    # list the other powersets to compare against
    ops = [x for ind, x in enumerate(powersets) if ind != i]
    # find unique subsets: those that are only a subset of the current set
    # and not found in the powerset of any of the other sets
    uss = list(set(ss)-set([x for y in ops for x in y if x in ss]))
    #uss = []
    #for s in ss:
    #    contains_s = [(s in ps) for ps in ops]
    #    if not any(contains_s):
    #        uss.append(s)
    str_uss = ', '.join([f"({', '.join(x)})" for x in uss])
    print(f"set {l}: {str_uss}")

Outputting:

set A: (3, 5), (2, 5), (5)
set B: 
set C: (2, 7)
set D: 
set E: (11), (2, 13), (13), (11, 13), (2, 11)
set F:

The answers are a bit different to what you suggested they should be, but they seem to be correct for what you described. Hope that helps!

Upvotes: 0

Related Questions