Scott Woodhall
Scott Woodhall

Reputation: 305

Linear Optimization with duplication allowed / issue with adding constraints

The problem I'm trying to solve is fairly complicated. It doesn't have to be an exact solution but more so, one that's as close as I can get.

The overview of the problem is as follows, I have 125,000 rows of different combinations of names. Sample of that data listed below:

0          James Harden  Jalen Suggs       Josh Giddey    Scottie Barnes    Christian Wood   Shai Gilgeous-Alexander  Michael Porter Jr.           Nikola Vucevic
1          James Harden  Jalen Suggs        Kyle Kuzma    Scottie Barnes    Christian Wood   Shai Gilgeous-Alexander         Josh Giddey           Nikola Vucevic
2           Jalen Suggs  Jalen Green       Josh Giddey     Tobias Harris    Christian Wood              James Harden  Michael Porter Jr.  Shai Gilgeous-Alexander
3           Jalen Suggs  Jalen Green    Scottie Barnes  Domantas Sabonis    Christian Wood              James Harden         Josh Giddey  Shai Gilgeous-Alexander
4           Jalen Suggs  Jalen Green        Kyle Kuzma     Tobias Harris    Christian Wood              James Harden         Josh Giddey  Shai Gilgeous-Alexander

I also have a list of the unique names and the maximum amount of times they can appear in the output subset. Example below:

79              Jalen Suggs    PG/SG        ORL          43.0
78              Josh Giddey       SF        OKC          41.0
33  Shai Gilgeous-Alexander       PG        OKC          39.0
27           Christian Wood        C        HOU          36.0
69              Jalen Green    PG/SG        HOU          32.0

The solver would choose the subset of rows until it meets all constraints above. In this example, 43 rows of the output subset would have "Jalen Suggs" in them, 41 would have "Josh Giddey", so on and so forth. Eventually it would come to a solution that meets all of the name constraints and output a subset of rows that I can use. The kicker to all of this is, the same row CAN be used multiple times if it fits within the name constraints. If we simplify the problem to a super basic example, let's say all I needed was the solver to find me 1 lineup with both "Jalen Suggs" & "Josh Giddey" and return me that row.

I'm hoping my output looks identical to my first snippet of code, except all constraints from my second snippet are met.

As far as code goes, I haven't got very far, but here's what I have:

prob = LpProblem('LineUp Optimization', LpMaximize)
## make all rows into nested list
lineups = df.values.tolist()

cap = dict(zip(players, player_df['Cap']))

lineup_var = LpVariable.dicts('Lineups', lineups, 0, cat='Integer')

Maybe I'm going about this problem in the wrong way, I'm not sure.

Upvotes: 0

Views: 64

Answers (1)

borderline27
borderline27

Reputation: 116

Late reply but you can just loop through the unique players and add a constraint that says that the set of rows chosen must include each players name the specified number of times, and then (if I understood correctly) minimize the number of rows used in the objective. E.g.

from pulp import *

prob = LpProblem('LineUp Optimization', LpMinimize)

# Players
players = [[79,"JalenSuggs","PG/SG","ORL",1.0],
[78,"JoshGiddey","SF","OKC",1.0],
[33,"ShaiGilgeous-Alexander","PG","OKC",3.0],
[27,"ChristianWood","C","HOU",1.0],
[69,"JalenGreen","PG/SG","HOU",2.0]]

# Rows
rows = [[0,"JamesHarden","JalenSuggs","JoshGiddey","ScottieBarnes","ChristianWood","ShaiGilgeous-Alexander","MichaelPorterJr.","NikolaVucevic"],
[1,"JamesHarden","JalenSuggs","KyleKuzma","ScottieBarnes","ChristianWood","ShaiGilgeous-Alexander","JoshGiddey","NikolaVucevic"],
[2,"JalenSuggs","JalenGreen","JoshGiddey","TobiasHarris","ChristianWood","JamesHarden","MichaelPorterJr.","ShaiGilgeous-Alexander"],
[3,"JalenSuggs","JalenGreen","ScottieBarnes","DomantasSabonis","ChristianWood","JamesHarden","JoshGiddey","ShaiGilgeous-Alexander"],
[4,"JalenSuggs","JalenGreen","KyleKuzma","TobiasHarris","ChristianWood","JamesHarden","JoshGiddey","ShaiGilgeous-Alexander"]]

lineup = LpVariable.dicts('Lineups', range(len(rows)), 0, None, cat=LpInteger)

for p in players:
    rows_list = [i for i,j in enumerate(rows) if p[1] in j] # Get all rows the player appears in
    prob += lpSum(lineup[r] for r in rows_list) >= p[4] # From this set of rows, we need enough to get that players name at least p[4] times

prob += lpSum(lineup[r] for r in range(len(rows))), "Minimize number of rows used" 

prob.solve(PULP_CBC_CMD(msg=True))

for r in range(len(rows)):
    if lineup[r].varValue > 0:
        print(f"{lineup[r].varValue}x{rows[r]}")

This will find the smallest subset of rows that taken together contain at least the number of each player name from players. Example output:

Result - Optimal solution found

Objective value:                3.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.01
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.01

1.0x[0, 'JamesHarden', 'JalenSuggs', 'JoshGiddey', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'MichaelPorterJr.', 'NikolaVucevic']
2.0x[2, 'JalenSuggs', 'JalenGreen', 'JoshGiddey', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'MichaelPorterJr.', 'ShaiGilgeous-Alexander']

If instead you want to maximize the number of unique rows, without having more than the set number of each player, then:

prob = LpProblem('LineUp Optimization', LpMaximize)
...
lineup_count = LpVariable.dicts('Lineups', range(len(rows)), 0, None, cat=LpInteger)
lineupbool = LpVariable.dicts('Lineupbool', range(len(rows)), cat=LpBinary)

for p in players:
    rows_list = [i for i,j in enumerate(rows) if p[1] in j] # Get all rows the player appears in
    prob += lpSum(lineup_count[r] for r in rows_list) <= p[4] # From this set of rows, we need enough to get that players name at least p[4] times

prob += lpSum(lineupbool[r] for r in range(len(rows))), "Maximize number of rows used" 

for r in range(len(rows)):
    prob += lineupbool[r] <= lineup_count[r]

prob.solve(PULP_CBC_CMD(msg=True))

for r in range(len(rows)):
    if lineup_count[r].varValue > 0:
        print(f"{lineup_count[r].varValue}x{rows[r]}")

Output:

Result - Optimal solution found

Objective value:                5.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.01
Time (Wallclock seconds):       0.01

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.01

1.0x[0, 'JamesHarden', 'JalenSuggs', 'JoshGiddey', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'MichaelPorterJr.', 'NikolaVucevic']
32.0x[1, 'JamesHarden', 'JalenSuggs', 'KyleKuzma', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'JoshGiddey', 'NikolaVucevic']
1.0x[2, 'JalenSuggs', 'JalenGreen', 'JoshGiddey', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'MichaelPorterJr.', 'ShaiGilgeous-Alexander']
1.0x[3, 'JalenSuggs', 'JalenGreen', 'ScottieBarnes', 'DomantasSabonis', 'ChristianWood', 'JamesHarden', 'JoshGiddey', 'ShaiGilgeous-Alexander']
1.0x[4, 'JalenSuggs', 'JalenGreen', 'KyleKuzma', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'JoshGiddey', 'ShaiGilgeous-Alexander']

Upvotes: 1

Related Questions