Aleksandr Prystupa
Aleksandr Prystupa

Reputation: 13

How can I find the max sum of 5 values in this dataframe, but no values can be in the same row or column?

[[-9.92, 23.04, -6.06, -0.72, 21.32],

 [54.98, 15.58, 51.66, 54.1, 43.76],

 [49.22, 5.68, 25.24, 31.8, 43.3],

 [32.1, 15.12, 9.38, 28.96, 40.14],

 [13.2, 10.36, 12.44, -12.02, 15.8]]

I have a pandas data frame using this data structure. I would like to find the maximum sum of 5 values in the data frame, but the values cannot be in the same column or row. For instance, values of [9.92, 15.58, 25.24, 28.96, 15.8] would be permissible, but [9.92, 15.58, 25.24, 28.96, 40.14] & [9.92, 15.58, 25.24, 28.96, -12.02] would not be.

Ideally I would like to generate a list of lists from this data frame that fits my criteria and then find the maximum from there.

[{'Stephen Curry': -9.92,
  'Buddy Hield': 23.04,
  'Duncan Robinson': -6.06,
  'Damian Lillard': -0.72,
  'Joe Harris': 21.32},
 {'Stephen Curry': 54.98,
  'Buddy Hield': 15.58,
  'Duncan Robinson': 51.66,
  'Damian Lillard': 54.1,
  'Joe Harris': 43.76},
 {'Stephen Curry': 49.22,
  'Buddy Hield': 5.68,
  'Duncan Robinson': 25.24,
  'Damian Lillard': 31.8,
  'Joe Harris': 43.3},
 {'Stephen Curry': 32.1,
  'Buddy Hield': 15.12,
  'Duncan Robinson': 9.38,
  'Damian Lillard': 28.96,
  'Joe Harris': 40.14},
 {'Stephen Curry': 13.2,
  'Buddy Hield': 10.36,
  'Duncan Robinson': 12.44,
  'Damian Lillard': -12.02,
  'Joe Harris': 15.8}]

Upvotes: 1

Views: 98

Answers (1)

Riley
Riley

Reputation: 2261

This is essentially the Assignment Problem. This particular instance is small enough to brute force.

from itertools import permutations
import numpy as np

arr = np.array(
    [[-9.92, 23.04, -6.06, -0.72, 21.32],
     [54.98, 15.58, 51.66, 54.1, 43.76],
     [49.22, 5.68, 25.24, 31.8, 43.3],
     [32.1, 15.12, 9.38, 28.96, 40.14],
     [13.2, 10.36, 12.44, -12.02, 15.8]]
)

best_perm = None
best_sum = -np.inf
for perm in permutations(range(5)):
    s = sum([arr[perm[i],i] for i in range(5)])
    if s > best_sum:
        best_sum = s
        best_perm = perm

the result

best_sum = 178.94

best_perm = (2, 0, 4, 1, 3)

meaning the best solution is given by choosing 3rd row for column 1, 1st row for column 2, 5th row for column 3, 2nd row for column 4, and 4th row for column 5.

Generalising the problem to a large number of columns and rows could be intractable, in which case you could try integer programming with PuLP

Upvotes: 1

Related Questions