Reputation: 1
I have a set of vectors of length 4 (represented as a Nx4 matrix) where each element in the vector can take on the value -1, 0 or 1. I want to group the vectors into the smallest number of groups (and therefor the largest groups) such that each group satisfies the following constraint:
There must be a vector in the group for each combination of unique elements represented in each column of the vectors in the group.
For example, a group containing only the vectors [-1,1,0,1]
and [-1,1,1,1]
would satisfy the constraint because there is one unique value in each column of the group except the 3rd, which has 2, thus there are 2 possible combinations of the unique values in each column, both of which are represented in the group. However, grouping [1,0,-1,0]
and [1,0,0,1]
does not satisfy the constraint because there are 2 unique values in each of the 3rd and 4th columns, creating 4 possible combinations, only 2 of which are represented in the group. Adding [1,0,0,0]
and [1,0,-1,1]
to this group would satisfy the constraint. (Note that any single vector as a group on its own will always satisfy the constraint.)
These groups are "easily described" because you can just list the unique values for each column and that will fully describe the group, to the exclusion of all other vectors.
My first approach was to take the set as a whole, and first check if it already satisfies the constraint. If not, try leaving out one vector at a time and check if the remaining vectors satisfy the constraint. If none of those work then try leaving out all combinations of 2 vectors, then 3, and so on. Each time a particular subset satisfies the constraint, set those vectors aside and repeat the process on the remaining ones until there are none left. While this guarantees an optimal grouping (as far as I can tell), the run time is way too long for any set with more than ~25-30 vectors because you have to potentially check N choose k possible ways of leaving out a subset of vectors for all values of k from 1 to N-1.
I recently realized that you can think of this as more of a geometry problem if you imagine the space of possible vectors as a 3 by 3 by 3 by 3 hypercube where each unit hypercube represents a single vector. Groups that satisfy the constraint are hyper-rectangles (including wrapping around from -1 to 1) in this space which is potentially easier to think about than the original phrasing of the constraint. In this framing of the problem, I am looking for the minimum number of hyper-rectangles such that all vectors are contained in the hyper-rectangles, and no empty spaces exist in any hyper-rectangle. This approach has the promise of not exploding the run time combinatorially, but I haven't been able to come up with a good way of searching through the possible hyper-rectangles.
Does anyone have ideas for a faster algorithm to solve this problem?
Upvotes: 0
Views: 131
Reputation: 336
First of all, let's talk about your first approach which is globally a greedy algorithm. You pick out the biggest set you can find iteratively. It is a nice heuristic but does not guarantee the optimal grouping. Here is a 6-vector example in dimension 3 (4th is alwats 0 for instance):
(0, 0, -1); (0, 0, 0); (0, 0, 1); (0, -1, -1); (0, 1, 1); (1, 0, 0)
Here is some representation with the green node out of plan (6th).
Your algorithm will take first the only 3-set available: (0, 0, -1); (0, 0, 0); (0, 0, 1); (red line)
Letting 3 isolated vectors meaning a total of 4 sets. You can obviously make 3 sets of 2 vectors instead (1-4, 2-6, 3-5). (black lines)
An important question to solve this problem, is to know if a vector can be used in 2 different sets.
If no, this is clearly a NP-problem in my opinion. The greedy algorithm is the most reasonable way to treat it. You can save time by building a graph with all vectors as nodes and the edges meaning "compatible in the same group", that is to say that no hole make this association impossible. Then you look for the maximum clique.
If yes, I believe you can solve it optimally with a min cost flow algorithm. You have to list all acceptable sets, all represented by one node linked to source with a 1-cost and pouring in the up to 81 vector nodes, pouring themselves to the sink. There are about V=10000 acceptable sets with the 81 vectors presents. And some algorithm let you solve this in O(VElogVlogV), still better than a !81. Fortunately some "holes" drop V fast.
Upvotes: 0