Prodiction
Prodiction

Reputation: 187

choosing one to one result by a similarity matrix

I build a function, that finds some alignment by some metric.

It gets a matrix with already computed similarity values: weighted_res may be:

[[0.2, 0.5, 0.3],
 [0.1, 0.2, 0.4],
 [0.8, 0.2, 0.4],
 [0.1, 0.2, 0.7],
 [0.1, 0.2, 0.4],

My function maximizes the sum of the values for all combinations the indices of exs1 and exs2, but no index can be taken twice. The results are these optimal indices. The sum for (0,1), (2,0), (3,2), accordingly 0.5+0.8+0.7 produces the maximal score.

There are many cases, where finding for each column/row the maximum isn't enough. Let the matrix be:

[[0.1, 0.0, 0.1]
 [0.5, 0.6, 0.4],
 [0.5, 0.8, 0.3],
 [0.0, 0.0, 0.2]]

Here, it chooses (1,1), (2,1), (3,2), because 0.5+0.8+0.2 is the maximal reachable score.

My code is like the following and I fear, it is maximally ineffective. I would be happy about some hint to find a more efficient algorithm, than to compute all the possibilities and sum up and maximize. Here is that code:

def one_to_one(weighted_res, exs1, exs2, mask):

    inner_cube_len = min(len(list(exs1)), len(list(exs2)))
    turned = False

    if (len(exs1) < len(exs2)):
        exs1, exs2 = exs2, exs1
        weighted_res = weighted_res.T
        mask = mask.T
        turned = True

    x_to_choose = np.array(list(itertools.permutations(range(len(exs1)), inner_cube_len)))
    y_to_choose  = np.array(list(range (len(exs2))))

    weighted_res_overall = \
        weighted_res[x_to_choose,y_to_choose].sum(axis=1)

    best_overall_row  = np.argmax(weighted_res_overall)
    best_x_values     = np.array (x_to_choose[best_overall_row] )

    valid_mask        = mask[best_x_values,y_to_choose]
    best_res1         = best_x_values[valid_mask]
    best_res2         = y_to_choose[valid_mask]

    if not valid_mask.any():
        return [],[]
    if turned:
        left_value   = best_res2.tolist()
        right_values = [[x] for x in best_res1.tolist()]
        exs1, exs2 = exs2, exs1
        weighted_res = weighted_res.T
        mask = mask.T
    else:
        right_values =  [[x] for x in best_res2.tolist()]
        left_value   =  best_res1.tolist()
    return left_value, right_values

With input values with lengths of 8 and 6 of the input results, the weighted_res_overall has a size of 20160 and that grows extremly fast.

Upvotes: 1

Views: 265

Answers (2)

Prodiction
Prodiction

Reputation: 187

I found it, it's named Hungarian Algorithm, but with maximizing instead of minimizing the score. https://en.wikipedia.org/wiki/Hungarian_algorithm

There is a scipy implementation of it: https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.linear_sum_assignment.html

Or https://github.com/src-d/lapjv

Thanks for thinking about it!

Upvotes: 0

QuantumChris
QuantumChris

Reputation: 1093

If you transpose the matrix then you can easily find the maximum value for each column without repeats as follows:

from numpy import array

mat = [[0.2, 0.5, 0.3],
       [0.1, 0.2, 0.4],
       [0.8, 0.2, 0.4],
       [0.1, 0.2, 0.7],
       [0.1, 0.2, 0.4]]

mat = array(mat).T

maxis = [max(col) for col in mat]

If you then want the sum instead of a list of the maximum values, you can change the final generator expression to:

max_sum = sum(max(col) for col in mat)

Hope this helps.

Upvotes: 0

Related Questions