Gabriel Garcia
Gabriel Garcia

Reputation: 1018

How to reduce a set of sets to one where no element is a subset of another?

First of all: no, this is not homework! This fits into a solver I'm writing for a game. I just reduced the problem to this concise statement:

Given a set of sets S, find and remove any elements of S which are subsets of other elements of S.

Domain

1 <= |S| <= C^K

1 <= |Si| <= K

2 <= C <= 10

10 <= K <= 500

Details

Si is a subset of [0, K)

min(|Si|) >= logC(|S|)

My current approach is to keep each set inside S sorted via what I call a "NatSet" which is simply a bool[K]. I then sort S by |Si| and do a O(|S|^2) search to find elements which are subsets of other elements. This, unfortunately, is too slow for my target values C=6 and K=16*9.

Upvotes: 3

Views: 910

Answers (2)

Tony
Tony

Reputation: 6158

Considering I can't try it (the input of set S is invisible for me) and decide whether the following are helpful, I can only call them some tips for you:

  1. When comparing two set: 'diminished' binary search: when you compare whether set A (x1...xn) is the subset of set B( y1...ym ), suppose you find yk = x1 (n <= m)

        y1, y2, ...,yk, yk+1 ... ym
                    |
                   x1   
    

    then you can search x2 in the range of [yk+1, ym]. And the other are the same.

  2. When choosing two sets to compare: Choose the large one(ie, s1 and sk, then s1 and sk-1), you said you sort it by size and the large one is more likely to contain it.
  3. Sort Si? I am not sure whether sort S can improve your performance and you can try without sorting or using a max-heap (see tip 4)
  4. Using max-heap: compare the leaf with root and one of root's son... as the graph show).

enter image description here

If leaf doesn't removed (ie, isn't the subset of it's father), just remove it from heap and you can swap it to the place where the leaf is removed. (as following show)

enter image description here

Notice: Using heap can't guarantee all subset are removed.

  1. How to sort?: In your case, it seems that the counting sort is a suitable way to sort which is a linear sorting.

More:

1.And you said you are pruning tree, do you want Alpha–beta pruning ?

2.Efficient list intersection algorithm

Hope this can be helpful.

Upvotes: 2

ArtisanV
ArtisanV

Reputation: 511

The inherent issue here is that given two sets, one with p elements and the other with q elements (without loss of generality, we can say p < q), the fastest way you can get a Union of the two is O(p). This is because if you use HashSets you can reduce your lookup to O(1).

As I understand it, you are doing an O(K^2) search so you get a total run of O((n^2)*(K^2)), where n is the number of elements of S. You could reduce it to O((n^2)K) using the HashSet approach. Here's a link to Java http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html and almost all major languages have some implementation in their standard libraries. Although if your set S is anywhere near n= 6^(16*9) then I don't think this will help much (I'm not even sure if you can fit any relevant data in modern computers of that size).

If you could drive k down to <= 64 (or 32 in 32 bit systems) you could use bit maps to store each set inside S and parallel bit operations to reduce your time down to O(n^2). It would be equivalent to your current approach but you would just do a bitwise or and check if it equals the same number : ((a1 | a2) == a1) || ((a1 | a2) == a2). This way you get a union operation in constant time.

Upvotes: 0

Related Questions