Sergio Manchado
Sergio Manchado

Reputation: 143

Minimize sum of columns in matrix by permutating elements in python

I have the following matrix:

([2, 5, 5, 10]
 [7, 1, 4, 1]
 [1, 3, 3, 9])

If the columns are summed the result is:

[10, 9, 12, 20]

My objective is to determine the optimum way to sort the elements in the diferent rows in order to minimize the maximum element in the sum of columns.

For example, one possibility would be:

([2, 5, 5, 10]
 [7, 1, 4, 1]
 [1, 9, 3, 3])

If the columns are summed the result is:

[10, 15, 12, 14]

This is a better solution than the first one.

The easiest way to do this is checking all the possible permutations, but this method gets incredible slow in python as the matrix grows.

Any idea to do this in a faster way?

Upvotes: 3

Views: 1333

Answers (3)

HellSaint
HellSaint

Reputation: 111

How optimal do you want your optimal to be?

The problem you have stated is, to the best of my knowledge, NP-complete. Several heuristics exist in order to find sub-optimal solutions in polynomial time. For example, in the paper Permuting Elements within Columns of a Matrix in Order to Minimize Maximum Row Sum, by Coffman and Yannakakis, they propose an algorithm with complexity O(m^2 n) (where the matrix is defined as m x n) that achieves, at worst case scenario, a performance 1.5 - (0.5/m) times larger than the optimal. Note that they problem is equivalent to yours (just transpose your matrix). Since the paper is paid, I am not sure I can replicate their algorithm here, but take a look if you are interested (and most universities will have it in their own online libraries for free for students).

Other algorithms (e.g. Approximation Algorithms for the Assembly Line Crew Scheduling Problem from Hsu) exist, and maybe there is more recent literature to which I am not familiarized.

I would like to note that the solutions in the literature present considerably bad results in your example, so it might only be interesting if you are looking for mathematical guarantees on worst-case scenarios.

The point here is: if your goal is to minimize the maximum element in the sum of columns, then the answer to the question Any idea to do this in a faster way? is: There is none. The problem is NP-complete. The best you can do is try out all possibilities and hope you find the optimal solution early enough.

That said, it is a trade-off between how much complex you can accept your algorithm to be against how far from optimal you can accept in a worst-case scenario.

A few insights about the problem: In general, it is easy to find examples where permutations of a single pair of elements can not improve the objective, although the output is obviously not optimal. For example, consider the matrix

 2     1     0
 0     1     2
 2     1     0

The worst sum is at the first column, however, if you permute any of the 2s with any other element, the sum does not improve at all. For example, the algorithm in gbtimmon gets stuck in this matrix and returns a sum of [2, 3, 4]. The same is true for qwerty's answer. However, a trivially optimal solution is

 1     2     0
 0     1     2
 2     0     1

but in order to get from the original matrix to this optimal matrix, you necessarily have to perform a permutation that does not improve at first, for example, you can start by permuting the positions (1, 2) and (1,3).

An idea

One possibility, which I have no guarantees about the performance, is to try a tree-like algorithm. Define your loss/cost/objective function as the maximum of the sum of your columns and a starting matrix A. From matrix A, consider all pair-wise permutations that lead to a smaller or equal cost. For example, starting from

 2     1     0
 0     1     2
 2     1     0

we consider all pair-wise permutations that lead to a cost smaller than or equal to 4. In this case, the only permutations which would be excluded are:

 2     1     0
 2     1     0
 2     1     0

and

 2     1     0
 1     0     2
 2     1     0

and all other permutations have a cost of exactly 4.

Then, we consider all pair-wise permutations of all the matrices we just got. If your matrix is m x n, then, for each matrix there are n * n-choose-k(m, 2) pair-wise permutations. We then cut out all the matrices with non-minimum cost from this tree. In this case, all other matrices have cost 4, so we don't cut any. Then, we do the same thing for each matrix. This leads to the optimal solution in my example, and, in your example, it gives two equally optimal solutions, which are

 5    10     5     2
 7     1     4     1
 1     3     3     9

and

 5     2     5    10
 7     1     4     1
 1     9     3     3

which are clearly the same solution, minus a permutation of columns. Both have sum {12, 12, 13, 14} (not ordered by the columns). Note, however, that this algorithm will become complex as the size of the matrices increases, especially if there are many solutions which provide the same cost. There is also a lot of redundancy if two pair-wise permutations achieve the same cost. Nonetheless, it should be faster than trying out all possibilities, as we are excluding "bad" ones.

Sorry for not implementing it in python to present the whole code, but I think the algorithm is clear and easy enough for the reader to implement. :)

Upvotes: 1

qwerty
qwerty

Reputation: 116

Here is an idea:

  1. Pick 2 columns with smallest and largest sum. Note their difference, d.
  2. Inspect elements in both columns. Find a row with largest absolute value of difference d' such that d' < d and d' > 0.
  3. Swap the elements in that row.
  4. Repeat steps 1-3, until step 2 is no longer possible.

Example: Given

([2, 5, 5, 10]
 [7, 1, 4, 1]
 [1, 3, 3, 9])

We pick 2 columns with smallest and largest sum. Here we have column 1 with smallest sum and column 3 with largest sum. For these 2 columns, the difference of their sum, d, is 11.

([5, 10]
 [1, 1]
 [3, 9])

Now we find largest difference d' such that d' < d and d' > 0, which is 9 - 3 = 6. We now swap the elements in that row. So we have

([2, 5, 5, 10]
 [7, 1, 4, 1]
 [1, 9, 3, 3])

This matrix has column-sum of [10, 15, 12, 14]

Repeat the above process one more time, then you will end up with the following:

([5, 2, 5, 10]
 [7, 1, 4, 1]
 [1, 9, 3, 3])

This resulting matrix has the sum of [13, 12, 12, 14]. At this point, step 2 is no longer possible. So we are done.

Upvotes: 5

gbtimmon
gbtimmon

Reputation: 4322

First lets strengthen your requirement you could ask

"Can I produce a matrix that minimizes the difference between the max sum and the min sum of each column in my matrix" 

This is good because:

  1. It will satisfy your original requirement so solving this solves your question
  2. With this requirement it is easy to show sub-optimality in each iteration so we can convince ourselves that a greedy approach is going to work.

To implement a greedy solution just hold a running sum of your mat and for each row insert the lowest value in the current row into the highest sum column. This ensure that the column are as evenly stacked as possible.

This will take m inserts for each of n rows and 2mlogm sorts of each row so should run at O(n*m + n*2*mlogm) so O(nmlogm).

output_mat = []

input_mat = [
     [2, 5, 5, 10],
     [7, 1, 4, 1],
     [1, 3, 3, 9],
]

row_size = len(input_mat[0])
running_sum = [0] * row_size

for row in input_mat:
    sorted_idx = [
        x[0] for x in 
        sorted(enumerate(row), key=lambda x: x[1])
    ]

    sum_sorted_idx = [
         x[0] for x in 
         sorted(enumerate(running_sum), key=lambda x: x[1], reverse=True)
    ]

    new_val_row = [None] * row_size
    for col_idx,val_idx in zip(sum_sorted_idx, sorted_idx):
        new_val_row[col_idx] = row[val_idx]
        running_sum[col_idx] += row[val_idx]

    output_mat.append(new_val_row)

for x in output_mat:
    print ">> %s" % x
print(running_sum)

Output:

>> [2, 5, 5, 10]
>> [7, 1, 4, 1]
>> [3, 9, 3, 1]
[12, 15, 12, 12]

Upvotes: 2

Related Questions