Reputation: 164
I have the following data definition about a football game:
Game = namedtuple('Game', ['Date', 'Home', 'Away', 'HomeShots', 'AwayShots',
'HomeBT', 'AwayBT', 'HomeCrosses', 'AwayCrosses',
'HomeCorners', 'AwayCorners', 'HomeGoals',
'AwayGoals', 'HomeXG', 'AwayXG'])
Here are some exmaples:
[Game(Date=datetime.date(2018, 10, 21), Home='Everton', Away='Crystal Palace', HomeShots='21', AwayShots='6', HomeBT='22', AwayBT='13', HomeCrosses='21', AwayCrosses='14', HomeCorners='10', AwayCorners='5', HomeGoals='2', AwayGoals='0', HomeXG='1.93', AwayXG='1.5'),
Game(Date=datetime.date(2019, 2, 27), Home='Man City', Away='West Ham', HomeShots='20', AwayShots='2', HomeBT='51', AwayBT='6', HomeCrosses='34', AwayCrosses='5', HomeCorners='12', AwayCorners='2', HomeGoals='1', AwayGoals='0', HomeXG='3.68', AwayXG='0.4'),
Game(Date=datetime.date(2019, 2, 9), Home='Fulham', Away='Man Utd', HomeShots='12', AwayShots='15', HomeBT='19', AwayBT='38', HomeCrosses='20', AwayCrosses='12', HomeCorners='5', AwayCorners='4', HomeGoals='0', AwayGoals='3', HomeXG='2.19', AwayXG='2.13'),
Game(Date=datetime.date(2019, 3, 9), Home='Southampton', Away='Tottenham', HomeShots='12', AwayShots='15', HomeBT='13', AwayBT='17', HomeCrosses='15', AwayCrosses='15', HomeCorners='1', AwayCorners='10', HomeGoals='2', AwayGoals='1', HomeXG='2.08', AwayXG='1.27'),
Game(Date=datetime.date(2018, 9, 22), Home='Man Utd', Away='Wolverhampton', HomeShots='16', AwayShots='11', HomeBT='17', AwayBT='17', HomeCrosses='26', AwayCrosses='13', HomeCorners='5', AwayCorners='4', HomeGoals='1', AwayGoals='1', HomeXG='0.62', AwayXG='1.12')]
And two almost identical functions calculating home and away statistics for a given team.
def calculate_home_stats(team, games):
"""
Calculates home stats for the given team.
"""
home_stats = defaultdict(float)
home_stats['HomeShotsFor'] = sum(int(game.HomeShots) for game in games if game.Home == team)
home_stats['HomeShotsAgainst'] = sum(int(game.AwayShots) for game in games if game.Home == team)
home_stats['HomeBoxTouchesFor'] = sum(int(game.HomeBT) for game in games if game.Home == team)
home_stats['HomeBoxTouchesAgainst'] = sum(int(game.AwayBT) for game in games if game.Home == team)
home_stats['HomeCrossesFor'] = sum(int(game.HomeCrosses) for game in games if game.Home == team)
home_stats['HomeCrossesAgainst'] = sum(int(game.AwayCrosses) for game in games if game.Home == team)
home_stats['HomeCornersFor'] = sum(int(game.HomeCorners) for game in games if game.Home == team)
home_stats['HomeCornersAgainst'] = sum(int(game.AwayCorners) for game in games if game.Home == team)
home_stats['HomeGoalsFor'] = sum(int(game.HomeGoals) for game in games if game.Home == team)
home_stats['HomeGoalsAgainst'] = sum(int(game.AwayGoals) for game in games if game.Home == team)
home_stats['HomeXGoalsFor'] = sum(float(game.HomeXG) for game in games if game.Home == team)
home_stats['HomeXGoalsAgainst'] = sum(float(game.AwayXG) for game in games if game.Home == team)
home_stats['HomeGames'] = sum(1 for game in games if game.Home == team)
return home_stats
def calculate_away_stats(team, games):
"""
Calculates away stats for the given team.
"""
away_stats = defaultdict(float)
away_stats['AwayShotsFor'] = sum(int(game.AwayShots) for game in games if game.Away == team)
away_stats['AwayShotsAgainst'] = sum(int(game.HomeShots) for game in games if game.Away == team)
away_stats['AwayBoxTouchesFor'] = sum(int(game.AwayBT) for game in games if game.Away == team)
away_stats['AwayBoxTouchesAgainst'] = sum(int(game.HomeBT) for game in games if game.Away == team)
away_stats['AwayCrossesFor'] = sum(int(game.AwayCrosses) for game in games if game.Away == team)
away_stats['AwayCrossesAgainst'] = sum(int(game.HomeCrosses) for game in games if game.Away == team)
away_stats['AwayCornersFor'] = sum(int(game.AwayCorners) for game in games if game.Away == team)
away_stats['AwayCornersAgainst'] = sum(int(game.HomeCorners) for game in games if game.Away == team)
away_stats['AwayGoalsFor'] = sum(int(game.AwayGoals) for game in games if game.Away == team)
away_stats['AwayGoalsAgainst'] = sum(int(game.HomeGoals) for game in games if game.Away == team)
away_stats['AwayXGoalsFor'] = sum(float(game.AwayXG) for game in games if game.Away == team)
away_stats['AwayXGoalsAgainst'] = sum(float(game.HomeXG) for game in games if game.Away == team)
away_stats['AwayGames'] = sum(1 for game in games if game.Away == team)
return away_stats
I'm wondering if there is a way to abstract over these two functions and merge them into one without creating a wall of if/else statements to determine whether the team plays at home or away from home and which fields should be counted.
Upvotes: 0
Views: 169
Reputation: 10942
Having cleaner data structure allow for writing simpler code.
In that case, your data already contains duplication
(eg, you have both HomeShots
and AwayShots
).
There are many possible answers to how you could structure data here. I'll just go over a solution that doesn't change too much from your original structure.
Statistics = namedtuple('Statistics', ['shots', 'BT', 'crosses', 'corners', 'goals', 'XG'])
Game = namedtuple('Game', ['home', 'away', 'date', 'home_stats', 'away_stats'])
You could use this like this (I haven't included all stats here, just a few to give an example):
def calculate_stats(games, team_name, home_stats_only=False, away_stats_only=False):
home_stats = [g.home_stats._asdict() for g in games if g.home == team_name]
away_stats = [g.away_stats._asdict() for g in games if g.away == team_name]
if away_stats_only:
input_stats = away_stats
elif home_stats_only:
input_stats = home_stats
else:
input_stats = home_stats + away_stats
def sum_on_field(field_name):
return sum(stats[field_name] for stats in input_stats)
return {f:sum_on_field(f) for f in Statistics._fields}
Which can then be used to get both away/home stats:
example_game_1 = Game(
home='Burnley',
away='Arsenal',
date=datetime.now(),
home_stats=Statistics(shots=12, BT=26, crosses=21, corners=4, goals=1, XG=1.73),
away_stats=Statistics(shots=17, BT=26, crosses=22, corners=5, goals=3, XG=2.87),
)
example_game_2 = Game(
home='Arsenal',
away='Pessac',
date=datetime.now(),
home_stats=Statistics(shots=1, BT=1, crosses=1, corners=1, goals=1, XG=1),
away_stats=Statistics(shots=2, BT=2, crosses=2, corners=2, goals=2, XG=2),
)
print(calculate_stats([example_game_1, example_game_2], 'Arsenal'))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', home_stats_only=True))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', away_stats_only=True))
Which prints:
{'shots': 18, 'BT': 27, 'crosses': 23, 'corners': 6, 'goals': 4, 'XG': 3.87}
{'shots': 1, 'BT': 1, 'crosses': 1, 'corners': 1, 'goals': 1, 'XG': 1}
{'shots': 17, 'BT': 26, 'crosses': 22, 'corners': 5, 'goals': 3, 'XG': 2.87}
When dealing with this kind of data, it's usually a good idea to use specialised tools like, for example, pandas. It could also be very convenient to use interactive tools, like JupyterLab.
Upvotes: 1
Reputation: 121
I recommend not using a named tuple but a simple tuple with a dictionary, for example:
game=(datetime.date(2019, 5, 12), 'Burnley', 'Arsenal', '12', '17', '26', '26', '21', '22', '4', '5', '1', '3', '1.73', '2.87')
And a mapping dictionary:
numtostr={0: 'Date', 1: 'Home', 2: 'Away', 3: 'HomeShots', 4: 'AwayShots', 5: 'HomeBT', 6: 'AwayBT', 7: 'HomeCrosses', 8: 'AwayCrosses', 9: 'HomeCorners', 10: 'AwayCorners', 11: 'HomeGoals', 12: 'AwayGoals', 13: 'HomeXG'}
strtonum={'Date': 0, 'Home': 1, 'Away': 2, 'HomeShots': 3, 'AwayShots': 4, 'HomeBT': 5, 'AwayBT': 6, 'HomeCrosses': 7, 'AwayCrosses': 8, 'HomeCorners': 9, 'AwayCorners': 10, 'HomeGoals': 11, 'AwayGoals': 12, 'HomeXG': 13}
Make the mapping dictionaries for homestats and awaystats ({0: 'HomeShotsFor', 1: 'HomeShotsAgainst' etc} for home_stats). To explain how mapping dictionaries work, for example, if you want to get the HomeCrosses of a game, you can have
game[7]
or
game[strtonum['HomeCrosses']]
Then the functions:
def calculate_home_stats(team, games):
home_stats=[0]*13
for game in games:
if game[1]=team:
for index in range(12):
home_stats[index]+=game[index+3] #because you just put the sum of everything except date, home, and away which are the first 3 indices. see how this cleans everything up?
home_stats[12]+=1
def calculate_away_stats(team, games):
away_stats=[0]*13
for game in games:
if game[2]=team:
for index in range(12):
away_stats[index]+=game[index+3]
away_stats[12]+=1
If you really want to merge both functions into one you can do this:
def calculate_stats(team, games, homeaway):
stats=[0]*13
for game in games:
if game[{'Home': 1, 'Away': 2}[homeaway]]=team:
for index in range(12):
stats[index]+=game[index+3]
stats[12]+=1
As with my function the only thing you have to change is the index to check for home or away, instead of the redundant if else statements which require a lot of change.
Upvotes: 0