Reputation: 2116
I am trying to implement a (three-way) tie-breaking procedure that you might see in (US) sports. I want to first sort by wins and if tied use the head to head tiebreaker.
I need this to be the runtime to be as fast as possible, I don't really care about the memory requirements. If there is a better way to represent my data such that it is easy that is also a useful answer.
The data I want to sort has at most 15 values, so the runtime isn't bad in that regard, I just want to do this 100k times.
Pseudo-code would look something like this:
Iterator = 0
maxVal = max value of wins
maxes = teams with wins == maxVal
If len(maxes) == 1
rank[values] = iterator
iterator += 1
sort(restOfData)
Else
# H2Hwins computes the amount of wins for teams currently tied incase of 2 or more teams tied
counts = sorted([(h2hwins(t, maxes), pointDifferential) for team in maxes])
for c in counts
rank[value] = iterator
iterator += 1
sort(restOfData)
return rank
So if I had the following inputs, these would be the outputs:
# Columns are Team, Wins, H2H Tiebreaks, Point Differential
# Lakers win tie based on H2H with Clippers
testData = [['Lakers', 48, ['Clippers'], 6], ['Clippers', 48, ['Warriors'], 8], ['Warriors', 47, ['Lakers'], 10]]
magicSort(testData)
>>> ['Lakers', 'Clippers', 'Warriors']
# Warriors have 2 H2H tiebreakers so they are 1st. Lakers have 1 H2H tiebreaker so they are 2nd.
testData2 = [['Lakers', 48, ['Clippers'], 6], ['Clippers', 48, [''], 8], ['Warriors', 48, ['Lakers', 'Clippers'], 10]]
magicSort(testData2)
>>> ['Warriors', 'Lakers', 'Clippers']
# All 3 are tied so we default to point differential
testData3 = [['Lakers', 47, ['Clippers'], 6], ['Clippers', 47, ['Warriors'], 8], ['Warriors', 47, ['Lakers'], 10]]
magicSort(testData3)
>>> ['Warriors', 'Clippers', 'Lakers']
I can come up with more test cases if needed, but I believe this covers edge cases
Upvotes: 3
Views: 710
Reputation: 33960
Updated answer: you want a sort algorithm which breaks three-way ties in your defined order of fields: a) Wins b) Number of H2H Tiebreaks c) Point Differential
I recommend using pandas anytime you want to do something complicated using data (such as a multi-key sort, like this). First, we have to massage your strange data format (recursively nested list) into a useable form to build a dataframe:
Team, Wins, H2H_Tiebreaks, Point_Differential
H2H_Tiebreaks
should be a tuple, even if it has length 1 or 0. Strictly we only care about its length (Num_H2H_Ties
), not its contentsdf.sort_values(by=['Wins','Num_H2H_Ties', 'Point_Differential'], ascending=False)
. Code at bottom:
.iloc[0]
.iloc[0, 0]
Solution:
import pandas as pd
cols = ['Team', 'Wins', 'H2H_Tiebreaks', 'Point_Differential']
# 1) Lakers win tie based on H2H with Clippers
dat = [('Lakers', 48, ('Clippers',), 6), ('Clippers', 48, ('Warriors',), 8), ('Warriors', 47, ('Lakers',), 10)]
df1 = pd.DataFrame(data=dat, columns=cols)
# 2) Warriors have 2 H2H tiebreakers so they are 1st. Lakers have 1 H2H tiebreaker so they are 2nd.
dat2 = [('Lakers', 48, ('Clippers',), 6), ('Clippers', 48, (), 8), ('Warriors', 48, ('Lakers', 'Clippers'), 10)]
df2 = pd.DataFrame(data=dat2, columns=cols)
# 3) All 3 are tied so we default to point-differential
dat3 = [('Lakers', 47, ('Clippers',), 6), ('Clippers', 47, ('Warriors',), 8), ('Warriors', 47, ('Lakers',), 10)]
df3 = pd.DataFrame(data=dat3, columns=cols)
############
df1['Num_H2H_Ties'] = df1['H2H_Tiebreaks'].apply(len)
df1.sort_values(by=['Wins','Num_H2H_Ties', 'Point_Differential'], ascending=False)
# Result:
Team Wins H2H_Tiebreaks Point_Differential Num_H2H_Ties
1 Clippers 48 (Warriors,) 8 1
0 Lakers 48 (Clippers,) 6 1
2 Warriors 47 (Lakers,) 10 1
############
df2['Num_H2H_Ties'] = df2['H2H_Tiebreaks'].apply(len)
df2.sort_values(by=['Wins','Num_H2H_Ties', 'Point_Differential'], ascending=False)
# Result:
Team Wins H2H_Tiebreaks Point_Differential Num_H2H_Ties
2 Warriors 48 (Lakers, Clippers) 10 2
0 Lakers 48 (Clippers,) 6 1
1 Clippers 48 () 8 0
############
df3['Num_H2H_Ties'] = df3['H2H_Tiebreaks'].apply(len)
df3.sort_values(by=['Wins','Num_H2H_Ties', 'Point_Differential'], ascending=False)
# Result:
Team Wins H2H_Tiebreaks Point_Differential Num_H2H_Ties
2 Warriors 47 (Lakers,) 10 1
1 Clippers 47 (Warriors,) 8 1
0 Lakers 47 (Clippers,) 6 1
and here as a function:
def sort_nway_tiebreaker(df):
# Filter only teams with max-Wins
df = df[df['Wins'] == df['Wins'].max()]
df['Num_H2H_Ties'] = df['H2H_Tiebreaks'].apply(len)
df = df.sort_values(by=['Wins','Num_H2H_Ties', 'Point_Differential'], ascending=False)
return df.iloc[0]
Upvotes: 3