pgfearo
pgfearo

Reputation: 2246

Algorithm for calculating all intersects and disjoints for an indefinite number of sets

Recently I had to develop a naive recursive algorithm for the set problem outlined below, but would now like to know what the formal description/name of the problem is and whether there's an algorithm that allows the problem to be solved more efficiently (I suspect there is).

I know there are separate algorithms for finding intersects and disjoints, but I haven't recognized any that cover this problem as a whole.

The problem:

To take an indefinite number of sets and return one set for each intersect and non-intersect of all the sets.

For example, given 4 sets as input A,B,C,D:

A {1,14,2,10,13,12,8,9}

B {2,3,4,15,11,13,9}

C {13,10,4,15,5,6,12,11}

D {7,8,9,13,11,6,12}

enter image description here

The result should be the following 13 sets:

{1,14},{2},{3},{4,15},{5},{6},{7},{8},{9},{10},{11},{12},{13}

enter image description here

The algorithm I developed was naive in that it compared all set permutations recursively (I wanted to be able to implement this relatively easily in XSLT 2.0) until no further intersects were found.

Some context, I needed to perform this operation to describe how multiple versions of a tree modify the original tree, at each point on the tree. The 'tree' is actually an XML document.

[Update] Showing the 1st algorithm in the accepted answer run against the 3 sets: {1,2,3},{2,3,4},{2,3,5}

enter image description here

Compared to my original nieve algorithm run over the 3 sets:

enter image description here

Upvotes: 2

Views: 244

Answers (1)

Save
Save

Reputation: 11938

The problem becomes quite easy, once some set property are clear.

A int B int C = ( A int B ) int C

what this tells us is that we can solve the problem for 2 sets, and then intersect every result with the the third set, and so on.

By solving hte problem, I mean simply that, A int B generates 3 sets:

A op B => A int B , A - (A int B) , B - (A int B)

once this is clear, the implementation comes simply, and should look like:

resultset = emptyset
while ( queue not empty )
   tmpset = emptyset
   A = queue.pop()
   residual = A
   for B in resultset
     tmpset = tmpset + {A int B , B - (A int B)}
     residual = residual - B
   tmpset = tempset + residual
   resultset = tmpset

obviously if any of the set is empty, it doesn't need to be added to the solution. This is some C# code that solves the problem using the dictionary idea:

        var query = arr.Select((x, i) => x.Select(y => new { Elem = y, Index = i }))
            .SelectMany(x => x)
            .GroupBy(x => x.Elem)
            .ToDictionary(x => x.Key, y => new HashSet<int>(y.Select(z => z.Index).ToList()))
            .GroupBy(x => x.Value, HashSet<int>.CreateSetComparer())
            .Select(x=>x.Select(y=>y.Key).ToList())
            .ToList();

Upvotes: 1

Related Questions