BCmonitor
BCmonitor

Reputation: 19

Creating a function that takes a DataFrame as an argument and returns a dictionary of scores

I am trying to create a function that will take the DataFrame as an argument and then count the total amount of "league points" each team has and then return a dictionary of sorted values by score. I have created functions that get me to the desired output, but I just want to know how to wrap it all together under one function.

The function needs to start like this:

func(df, w=3, l=0, d=1):

I have made a simple dataset as the actual problem is a larger dataset. each row is a football game between the two teams:

import pandas as pd
import numpy as np

data = {'Home Team':  ['Swindon', 'Bath', 'Northampton','Manchester', "Newcastle", 'Reigate'],
        'Away Team': ['Reigate', 'Manchester', 'Newcastle','Swindon', 'Bath', 'Northampton'],
        'Home Goals':[3,1,3,0,1,2],
        'Away Goals':[2,1,4,1,0,1],}

df = pd.DataFrame (data, columns = ['Home Team','Away Team','Home Goals','Away Goals','Home League Points', 'Away League Points'])

I have created a function that takes a row and will tell us how many league points each team earns from the game. I then applied this to the whole dataset which creates new columns:

Creating the function:

def leaguetablescore(row, w=3, l=0, d=1):
        if row['Home Goals'] < row['Away Goals'] :
            return [l ,  w]
        elif row['Home Goals'] == row['Away Goals']:
            return [d ,  d]
        else:
            return [w , l]

Applying the function to the dataframe:

df[['Home League Points', 'Away League Points']] = df.apply(lambda x : leaguetablescore(x), axis = 1, 
                                                        result_type ='expand')

output: DataFrame with new columns

I have then created a function that will return a dictionary of total league points that each team has using this new dataset with the new columns:

def TeamsPointsDict(df, teamslist):
    mydictionary = dict()
    for team in range(len(teamslist)):
        mydictionary[teamslist[team]] = (df.loc[df['Home Team'] == teamslist[team], 
                                              'Home League Points'].sum()) + (df.loc[df['Away Team'] == teamslist[team],
                                                                                             'Away League Points'].sum())
    
        
    return(mydictionary)

Output (and also my intended output from the singlular function that i am trying to create although would need to be sorted):

{'Swindon': 6,
 'Bath': 1,
 'Northampton': 0,
 'Manchester': 1,
 'Newcastle': 6,
 'Reigate': 3}

however, I am curious too how I could input my dataframe into a function and it return a dictionary (or dataframe) using a SINGLE function. I know I have all these steps created but is there a way I could define a function that would do it all from input?. The reason I need the function to start in the specified way is because I am hoping to that I will be able to alter the number of league points each team earns from a Win (w), Draw (d) or Loss (l) if I need to.

I hope this makes sense and any advice would be much appreciated! I've had fun trying to figure this out so I hope you do to. :)

PS

please let me know if the format of this post is ok as I am relatively new to stack overflow!

Upvotes: 0

Views: 1669

Answers (2)

Nk03
Nk03

Reputation: 14949

Do you want this -

home_df = df[['Home Team','Home League Points']].rename(columns = {'Home Team': 'Team', 'Home League Points': 'Points'})
away_df = df[['Away Team','Away League Points']].rename(columns = {'Away Team': 'Team', 'Away League Points': 'Points'})
result = dict(pd.concat([home_df,away_df]).groupby('Team',as_index=False).sum().to_dict(orient='split')['data'])

Idea is to create 2 separate df home/away and concat them(Row wise) -> then use groupby sum -> Finally, convert the data to dict.

Output-

{'Bath': 1,
 'Manchester': 1,
 'Newcastle': 6,
 'Northampton': 0,
 'Reigate': 3,
 'Swindon': 6}

Upvotes: 1

chitown88
chitown88

Reputation: 28630

I'm not quite sure what you are asking, but what I think you are saying is, you want the points awarded for a win, loss, draw to be configurable?

The 2 other functions I think are fine. To combine them into one, just create a function that takes in the dataframe then does that work.

Also as a side note, dictionaries/json structures are no inherently "ordered". Now python does have some ways to "sort" them, but just know that technically, they have no order.

so given:

import pandas as pd
import numpy as np

data = {'Home Team':  ['Swindon', 'Bath', 'Northampton','Manchester', "Newcastle", 'Reigate'],
        'Away Team': ['Reigate', 'Manchester', 'Newcastle','Swindon', 'Bath', 'Northampton'],
        'Home Goals':[3,1,3,0,1,2],
        'Away Goals':[2,1,4,1,0,1],}

df = pd.DataFrame (data, columns = ['Home Team','Away Team','Home Goals','Away Goals'])




def leaguetablescore(row, points_awarded):
        if row['Home Goals'] < row['Away Goals'] :
            return [points_awarded['l'] ,  points_awarded['w']]
        elif row['Home Goals'] == row['Away Goals']:
            return [points_awarded['d'], points_awarded['d']]
        else:
            return [points_awarded['w'], points_awarded['l']]

def TeamsPointsDict(df, teamslist):
    mydictionary = dict()
    for team in range(len(teamslist)):
        mydictionary[teamslist[team]] = (df.loc[df['Home Team'] == teamslist[team], 
                                              'Home League Points'].sum()) + (df.loc[df['Away Team'] == teamslist[team],
                                                                                             'Away League Points'].sum())
    return(mydictionary)

Make a dictioary or some other way to stroe what values you want for win, loss, draw. Then the function that takes in the dataframe and does the work:

points_awarded = {'w':3,
                  'l':0,
                  'd':1}

def get_dict(df):
    df[['Home League Points', 'Away League Points']] = df.apply(lambda x : leaguetablescore(x,points_awarded), axis = 1, 
                                                            result_type ='expand')
    teamslist = list(df['Home Team']) + list(df['Away Team'])
    mydictionary = TeamsPointsDict(df, list(set(teamslist)))
    return mydictionary

dictionary = get_dict(df)
print(dictionary)

Output:

{'Reigate': 3, 'Bath': 1, 'Northampton': 0, 'Manchester': 1, 'Swindon': 6, 'Newcastle': 6}

Upvotes: 1

Related Questions