How to abstract over two similar functions

I have the following data definition about a football game:

Game = namedtuple('Game', ['Date', 'Home', 'Away', 'HomeShots', 'AwayShots',
                           'HomeBT', 'AwayBT', 'HomeCrosses', 'AwayCrosses',
                           'HomeCorners', 'AwayCorners', 'HomeGoals',
                           'AwayGoals', 'HomeXG', 'AwayXG'])

Here are some exmaples:

[Game(Date=datetime.date(2018, 10, 21), Home='Everton', Away='Crystal Palace', HomeShots='21', AwayShots='6', HomeBT='22', AwayBT='13', HomeCrosses='21', AwayCrosses='14', HomeCorners='10', AwayCorners='5', HomeGoals='2', AwayGoals='0', HomeXG='1.93', AwayXG='1.5'),
 Game(Date=datetime.date(2019, 2, 27), Home='Man City', Away='West Ham', HomeShots='20', AwayShots='2', HomeBT='51', AwayBT='6', HomeCrosses='34', AwayCrosses='5', HomeCorners='12', AwayCorners='2', HomeGoals='1', AwayGoals='0', HomeXG='3.68', AwayXG='0.4'),
 Game(Date=datetime.date(2019, 2, 9), Home='Fulham', Away='Man Utd', HomeShots='12', AwayShots='15', HomeBT='19', AwayBT='38', HomeCrosses='20', AwayCrosses='12', HomeCorners='5', AwayCorners='4', HomeGoals='0', AwayGoals='3', HomeXG='2.19', AwayXG='2.13'),
 Game(Date=datetime.date(2019, 3, 9), Home='Southampton', Away='Tottenham', HomeShots='12', AwayShots='15', HomeBT='13', AwayBT='17', HomeCrosses='15', AwayCrosses='15', HomeCorners='1', AwayCorners='10', HomeGoals='2', AwayGoals='1', HomeXG='2.08', AwayXG='1.27'),
 Game(Date=datetime.date(2018, 9, 22), Home='Man Utd', Away='Wolverhampton', HomeShots='16', AwayShots='11', HomeBT='17', AwayBT='17', HomeCrosses='26', AwayCrosses='13', HomeCorners='5', AwayCorners='4', HomeGoals='1', AwayGoals='1', HomeXG='0.62', AwayXG='1.12')]

And two almost identical functions calculating home and away statistics for a given team.

def calculate_home_stats(team, games):
    """
    Calculates home stats for the given team.
    """
    home_stats = defaultdict(float)

    home_stats['HomeShotsFor'] = sum(int(game.HomeShots) for game in games if game.Home == team)
    home_stats['HomeShotsAgainst'] = sum(int(game.AwayShots) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesFor'] = sum(int(game.HomeBT) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesAgainst'] = sum(int(game.AwayBT) for game in games if game.Home == team)
    home_stats['HomeCrossesFor'] = sum(int(game.HomeCrosses) for game in games if game.Home == team)
    home_stats['HomeCrossesAgainst'] = sum(int(game.AwayCrosses) for game in games if game.Home == team)
    home_stats['HomeCornersFor'] = sum(int(game.HomeCorners) for game in games if game.Home == team)
    home_stats['HomeCornersAgainst'] = sum(int(game.AwayCorners) for game in games if game.Home == team)
    home_stats['HomeGoalsFor'] = sum(int(game.HomeGoals) for game in games if game.Home == team)
    home_stats['HomeGoalsAgainst'] = sum(int(game.AwayGoals) for game in games if game.Home == team)
    home_stats['HomeXGoalsFor'] = sum(float(game.HomeXG) for game in games if game.Home == team)
    home_stats['HomeXGoalsAgainst'] = sum(float(game.AwayXG) for game in games if game.Home == team)
    home_stats['HomeGames'] = sum(1 for game in games if game.Home == team)

    return home_stats


def calculate_away_stats(team, games):
    """
    Calculates away stats for the given team.
    """
    away_stats = defaultdict(float)

    away_stats['AwayShotsFor'] = sum(int(game.AwayShots) for game in games if game.Away == team)
    away_stats['AwayShotsAgainst'] = sum(int(game.HomeShots) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesFor'] = sum(int(game.AwayBT) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesAgainst'] = sum(int(game.HomeBT) for game in games if game.Away == team)
    away_stats['AwayCrossesFor'] = sum(int(game.AwayCrosses) for game in games if game.Away == team)
    away_stats['AwayCrossesAgainst'] = sum(int(game.HomeCrosses) for game in games if game.Away == team)
    away_stats['AwayCornersFor'] = sum(int(game.AwayCorners) for game in games if game.Away == team)
    away_stats['AwayCornersAgainst'] = sum(int(game.HomeCorners) for game in games if game.Away == team)
    away_stats['AwayGoalsFor'] = sum(int(game.AwayGoals) for game in games if game.Away == team)
    away_stats['AwayGoalsAgainst'] = sum(int(game.HomeGoals) for game in games if game.Away == team)
    away_stats['AwayXGoalsFor'] = sum(float(game.AwayXG) for game in games if game.Away == team)
    away_stats['AwayXGoalsAgainst'] = sum(float(game.HomeXG) for game in games if game.Away == team)
    away_stats['AwayGames'] = sum(1 for game in games if game.Away == team)

    return away_stats

I'm wondering if there is a way to abstract over these two functions and merge them into one without creating a wall of if/else statements to determine whether the team plays at home or away from home and which fields should be counted.

Upvotes: 0

Views: 169

Answers (2)

cglacet
cglacet

Reputation: 10942

Having cleaner data structure allow for writing simpler code. In that case, your data already contains duplication (eg, you have both HomeShots and AwayShots).

There are many possible answers to how you could structure data here. I'll just go over a solution that doesn't change too much from your original structure.

Statistics = namedtuple('Statistics', ['shots', 'BT', 'crosses', 'corners', 'goals', 'XG'])
Game = namedtuple('Game', ['home', 'away', 'date', 'home_stats', 'away_stats'])

You could use this like this (I haven't included all stats here, just a few to give an example):

def calculate_stats(games, team_name, home_stats_only=False, away_stats_only=False):

    home_stats = [g.home_stats._asdict() for g in games if g.home == team_name]
    away_stats = [g.away_stats._asdict() for g in games if g.away == team_name]

    if away_stats_only:
        input_stats = away_stats
    elif home_stats_only:
        input_stats = home_stats
    else:
        input_stats = home_stats + away_stats

    def sum_on_field(field_name):
        return sum(stats[field_name] for stats in input_stats)

    return {f:sum_on_field(f) for f in Statistics._fields}

Which can then be used to get both away/home stats:

example_game_1 = Game(
    home='Burnley', 
    away='Arsenal',
    date=datetime.now(),
    home_stats=Statistics(shots=12, BT=26, crosses=21, corners=4, goals=1, XG=1.73),
    away_stats=Statistics(shots=17, BT=26, crosses=22, corners=5, goals=3, XG=2.87),
)

example_game_2 = Game(
    home='Arsenal',
    away='Pessac',
    date=datetime.now(),
    home_stats=Statistics(shots=1, BT=1, crosses=1, corners=1, goals=1, XG=1),
    away_stats=Statistics(shots=2, BT=2, crosses=2, corners=2, goals=2, XG=2),
)

print(calculate_stats([example_game_1, example_game_2], 'Arsenal'))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', home_stats_only=True))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', away_stats_only=True))

Which prints:

{'shots': 18, 'BT': 27, 'crosses': 23, 'corners': 6, 'goals': 4, 'XG': 3.87}
{'shots': 1, 'BT': 1, 'crosses': 1, 'corners': 1, 'goals': 1, 'XG': 1}
{'shots': 17, 'BT': 26, 'crosses': 22, 'corners': 5, 'goals': 3, 'XG': 2.87}

When dealing with this kind of data, it's usually a good idea to use specialised tools like, for example, pandas. It could also be very convenient to use interactive tools, like JupyterLab.

Upvotes: 1

Aphrodite
Aphrodite

Reputation: 121

I recommend not using a named tuple but a simple tuple with a dictionary, for example:

game=(datetime.date(2019, 5, 12), 'Burnley', 'Arsenal', '12', '17', '26', '26', '21', '22', '4', '5', '1', '3', '1.73', '2.87')

And a mapping dictionary:

numtostr={0: 'Date', 1: 'Home', 2: 'Away', 3: 'HomeShots', 4: 'AwayShots', 5: 'HomeBT', 6: 'AwayBT', 7: 'HomeCrosses', 8: 'AwayCrosses', 9: 'HomeCorners', 10: 'AwayCorners', 11: 'HomeGoals', 12: 'AwayGoals', 13: 'HomeXG'}
strtonum={'Date': 0, 'Home': 1, 'Away': 2, 'HomeShots': 3, 'AwayShots': 4, 'HomeBT': 5, 'AwayBT': 6, 'HomeCrosses': 7, 'AwayCrosses': 8, 'HomeCorners': 9, 'AwayCorners': 10, 'HomeGoals': 11, 'AwayGoals': 12, 'HomeXG': 13}

Make the mapping dictionaries for homestats and awaystats ({0: 'HomeShotsFor', 1: 'HomeShotsAgainst' etc} for home_stats). To explain how mapping dictionaries work, for example, if you want to get the HomeCrosses of a game, you can have

game[7]

or

game[strtonum['HomeCrosses']]

Then the functions:

def calculate_home_stats(team, games):
    home_stats=[0]*13
    for game in games:
        if game[1]=team:
            for index in range(12):
                home_stats[index]+=game[index+3] #because you just put the sum of everything except date, home, and away which are the first 3 indices. see how this cleans everything up?
            home_stats[12]+=1

def calculate_away_stats(team, games):
    away_stats=[0]*13
    for game in games:
        if game[2]=team:
            for index in range(12):
                away_stats[index]+=game[index+3]
            away_stats[12]+=1

If you really want to merge both functions into one you can do this:

def calculate_stats(team, games, homeaway):
    stats=[0]*13
    for game in games:
        if game[{'Home': 1, 'Away': 2}[homeaway]]=team:
            for index in range(12):
                stats[index]+=game[index+3]
            stats[12]+=1

As with my function the only thing you have to change is the index to check for home or away, instead of the redundant if else statements which require a lot of change.

Upvotes: 0

Related Questions