sp00m
sp00m

Reputation: 48837

Finding the "expanded to factored" algorithm

This question is about algorithms and thus language-independent.


Given the following rows:

They can be factored as follow:

+----+----+----+----+
| A1 | B1 | C1 | D1 |
| A2 | B2 |    |    |
| A3 |    |    |    |
+----+----+----+----+
| A1 | B1 | C2 | D1 |
+----+----+----+----+

The following objects can store those data:

class ExpandedRow {
  String a;
  String b;
  String c;
  String d;
}

class FactoredRow {
  List<String> as;
  List<String> bs;
  List<String> cs;
  List<String> ds;
}

Concerning the transformations algorithms, the factored --> expanded one is quite easy:

List<FactoredRow> factoredRows = fill();
List<ExpandedRow> expandedRows = empty();
for each factoredRow in factoredRows {
  for each a in factoredRow.as {
    for each b in factoredRow.bs {
      for each c in factoredRow.cs {
        for each d in factoredRow.ds {
          expandedRows.add(new ExpandedRow(a, b, c, d));
        }
      }
    }
  }
}

But I'm lost concerning the expanded --> factored one. How can I factorize a List<ExpandedRow> into a List<FactoredRow>?

In other words, I have the factored table as input. I expand it using the provided algorithm and store it in its expanded state. The question is: how to retrieve the initial factored state after having expanding it?


I thought that if two expanded rows have only one attribute that differs, they can be factored, for example A1, B1, C1, D1 (1) and A1, B1, C2, D1 (2). But if we factorize those two rows together, we will end with:

+----+----+----+----+
| A1 | B1 | C1 | D1 |
|    |    | C2 |    |
+----+----+----+----+
| A1 | B2 | C1 | D1 |
| A2 |    |    |    |
| A3 |    |    |    |
+----+----+----+----+
| A2 | B1 | C1 | D1 |
| A3 |    |    |    |
+----+----+----+----+

Which is less factored than the initial table.

It's seems that there are many factored solutions, and the main issue is to define and to find the most factored one.

Upvotes: 0

Views: 139

Answers (1)

Edward Doolittle
Edward Doolittle

Reputation: 4100

This problem seems something like a graph partitioning problem. I suspect it's NP-hard but I haven't been able to prove it yet.

Let's take a simpler example to see what's going on. Consider the pairs (A1,B1), (A2,B1), (A3,B1), (A2,B2). We represent the points as points in 2D-space, and connect points if it is possible to move from one to the other by a translation parallel to the x- or y-axis:

           (A2,B2)
              |
(A1,B1) -- (A2,B1) -- (A3,B1)

The idea is to partition the graph by lines parallel to the axes, and repartition each partition, and so on, until we get pieces that are complete rectangles, line segments, or points.

There are two esssentially different ways of partitioning the graph above. We can draw a vertical line at position x=1.5:

           (A2,B2)
              |
(A1,B1)    (A2,B1) -- (A3,B1)

after which the right-side piece needs to be further partitioned (by a vertical or horizontal line, let's take horizontal):

           (A2,B2)

(A1,B1)    (A2,B1) -- (A3,B1)

We have now factored the original list into

A1 B1
-----
A2 B2
-----
A2 B1
A3

On the other hand, if we had made our initial partition with a horizontal line at position y=1.5, we would have

           (A2,B2)

(A1,B1) -- (A2,B1) -- (A3,B1)

which is already nicely factored into a point and a line segment:

A2 B2
-----
A1 B1
A2
A3

In higher dimensions (4D for letters A, B, C, D) we have a similar problem, except that there are correspondingly more choices for initial cuts, and the allowed final pieces are higher-dimesional (not just points, line segments, and rectangles but also 3D and 4D boxes).

The problem feels NP-hard to me, just like many other graph partitioning problems, but there are probably reasonably fast approximation algorithms.

Upvotes: 1

Related Questions