Reputation: 305
The problem I'm trying to solve is fairly complicated. It doesn't have to be an exact solution but more so, one that's as close as I can get.
The overview of the problem is as follows, I have 125,000 rows of different combinations of names. Sample of that data listed below:
0 James Harden Jalen Suggs Josh Giddey Scottie Barnes Christian Wood Shai Gilgeous-Alexander Michael Porter Jr. Nikola Vucevic
1 James Harden Jalen Suggs Kyle Kuzma Scottie Barnes Christian Wood Shai Gilgeous-Alexander Josh Giddey Nikola Vucevic
2 Jalen Suggs Jalen Green Josh Giddey Tobias Harris Christian Wood James Harden Michael Porter Jr. Shai Gilgeous-Alexander
3 Jalen Suggs Jalen Green Scottie Barnes Domantas Sabonis Christian Wood James Harden Josh Giddey Shai Gilgeous-Alexander
4 Jalen Suggs Jalen Green Kyle Kuzma Tobias Harris Christian Wood James Harden Josh Giddey Shai Gilgeous-Alexander
I also have a list of the unique names and the maximum amount of times they can appear in the output subset. Example below:
79 Jalen Suggs PG/SG ORL 43.0
78 Josh Giddey SF OKC 41.0
33 Shai Gilgeous-Alexander PG OKC 39.0
27 Christian Wood C HOU 36.0
69 Jalen Green PG/SG HOU 32.0
The solver would choose the subset of rows until it meets all constraints above. In this example, 43 rows of the output subset would have "Jalen Suggs" in them, 41 would have "Josh Giddey", so on and so forth. Eventually it would come to a solution that meets all of the name constraints and output a subset of rows that I can use. The kicker to all of this is, the same row CAN be used multiple times if it fits within the name constraints. If we simplify the problem to a super basic example, let's say all I needed was the solver to find me 1 lineup with both "Jalen Suggs" & "Josh Giddey" and return me that row.
I'm hoping my output looks identical to my first snippet of code, except all constraints from my second snippet are met.
As far as code goes, I haven't got very far, but here's what I have:
prob = LpProblem('LineUp Optimization', LpMaximize)
## make all rows into nested list
lineups = df.values.tolist()
cap = dict(zip(players, player_df['Cap']))
lineup_var = LpVariable.dicts('Lineups', lineups, 0, cat='Integer')
Maybe I'm going about this problem in the wrong way, I'm not sure.
Upvotes: 0
Views: 64
Reputation: 116
Late reply but you can just loop through the unique players and add a constraint that says that the set of rows chosen must include each players name the specified number of times, and then (if I understood correctly) minimize the number of rows used in the objective. E.g.
from pulp import *
prob = LpProblem('LineUp Optimization', LpMinimize)
# Players
players = [[79,"JalenSuggs","PG/SG","ORL",1.0],
[78,"JoshGiddey","SF","OKC",1.0],
[33,"ShaiGilgeous-Alexander","PG","OKC",3.0],
[27,"ChristianWood","C","HOU",1.0],
[69,"JalenGreen","PG/SG","HOU",2.0]]
# Rows
rows = [[0,"JamesHarden","JalenSuggs","JoshGiddey","ScottieBarnes","ChristianWood","ShaiGilgeous-Alexander","MichaelPorterJr.","NikolaVucevic"],
[1,"JamesHarden","JalenSuggs","KyleKuzma","ScottieBarnes","ChristianWood","ShaiGilgeous-Alexander","JoshGiddey","NikolaVucevic"],
[2,"JalenSuggs","JalenGreen","JoshGiddey","TobiasHarris","ChristianWood","JamesHarden","MichaelPorterJr.","ShaiGilgeous-Alexander"],
[3,"JalenSuggs","JalenGreen","ScottieBarnes","DomantasSabonis","ChristianWood","JamesHarden","JoshGiddey","ShaiGilgeous-Alexander"],
[4,"JalenSuggs","JalenGreen","KyleKuzma","TobiasHarris","ChristianWood","JamesHarden","JoshGiddey","ShaiGilgeous-Alexander"]]
lineup = LpVariable.dicts('Lineups', range(len(rows)), 0, None, cat=LpInteger)
for p in players:
rows_list = [i for i,j in enumerate(rows) if p[1] in j] # Get all rows the player appears in
prob += lpSum(lineup[r] for r in rows_list) >= p[4] # From this set of rows, we need enough to get that players name at least p[4] times
prob += lpSum(lineup[r] for r in range(len(rows))), "Minimize number of rows used"
prob.solve(PULP_CBC_CMD(msg=True))
for r in range(len(rows)):
if lineup[r].varValue > 0:
print(f"{lineup[r].varValue}x{rows[r]}")
This will find the smallest subset of rows that taken together contain at least the number of each player name from players. Example output:
Result - Optimal solution found
Objective value: 3.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.01
Time (Wallclock seconds): 0.00
Option for printingOptions changed from normal to all
Total time (CPU seconds): 0.01 (Wallclock seconds): 0.01
1.0x[0, 'JamesHarden', 'JalenSuggs', 'JoshGiddey', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'MichaelPorterJr.', 'NikolaVucevic']
2.0x[2, 'JalenSuggs', 'JalenGreen', 'JoshGiddey', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'MichaelPorterJr.', 'ShaiGilgeous-Alexander']
If instead you want to maximize the number of unique rows, without having more than the set number of each player, then:
prob = LpProblem('LineUp Optimization', LpMaximize)
...
lineup_count = LpVariable.dicts('Lineups', range(len(rows)), 0, None, cat=LpInteger)
lineupbool = LpVariable.dicts('Lineupbool', range(len(rows)), cat=LpBinary)
for p in players:
rows_list = [i for i,j in enumerate(rows) if p[1] in j] # Get all rows the player appears in
prob += lpSum(lineup_count[r] for r in rows_list) <= p[4] # From this set of rows, we need enough to get that players name at least p[4] times
prob += lpSum(lineupbool[r] for r in range(len(rows))), "Maximize number of rows used"
for r in range(len(rows)):
prob += lineupbool[r] <= lineup_count[r]
prob.solve(PULP_CBC_CMD(msg=True))
for r in range(len(rows)):
if lineup_count[r].varValue > 0:
print(f"{lineup_count[r].varValue}x{rows[r]}")
Output:
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.01
Time (Wallclock seconds): 0.01
Option for printingOptions changed from normal to all
Total time (CPU seconds): 0.01 (Wallclock seconds): 0.01
1.0x[0, 'JamesHarden', 'JalenSuggs', 'JoshGiddey', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'MichaelPorterJr.', 'NikolaVucevic']
32.0x[1, 'JamesHarden', 'JalenSuggs', 'KyleKuzma', 'ScottieBarnes', 'ChristianWood', 'ShaiGilgeous-Alexander', 'JoshGiddey', 'NikolaVucevic']
1.0x[2, 'JalenSuggs', 'JalenGreen', 'JoshGiddey', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'MichaelPorterJr.', 'ShaiGilgeous-Alexander']
1.0x[3, 'JalenSuggs', 'JalenGreen', 'ScottieBarnes', 'DomantasSabonis', 'ChristianWood', 'JamesHarden', 'JoshGiddey', 'ShaiGilgeous-Alexander']
1.0x[4, 'JalenSuggs', 'JalenGreen', 'KyleKuzma', 'TobiasHarris', 'ChristianWood', 'JamesHarden', 'JoshGiddey', 'ShaiGilgeous-Alexander']
Upvotes: 1