mgadfly
mgadfly

Reputation: 57

Creating new DF column based on average values from specific columns identified in second DF

I apologize as I prefer to ask questions where I've made an attempt at the code needed to resolve the issue. Here, despite many attempts, I haven't gotten any closer to a resolution (in part because I'm a hobbyist and self-taught). I'm attempting to use two dataframes together to calculate the average values in a specific column, then generate a new column to store that average.

I have two dataframes. The first contains the players and their stats. The second contains a list of each player's opponents during the season.

What I'm attempting to do is use the two dataframes to calculate expected values when facing a specific opponent. Stated otherwise, I'd like to be able to see if a player is performing better or worse than the expected results based on the opponent but first need to calculate the average of their opponents.

My dataframes actually have thousands of players and hundreds of matchups, so I've shortened them here to have a representative dataframe that isn't overwhelming.

The first dataframe (df) contains five columns. Name, STAT1, STAT2, STAT3, and STAT4.

The second dataframe (df_Schedule) has a Name column but then has a separate column for each opponent faced. The df_Schedule usually contains different numbers of columns depending on the week of the season. For example, after week 1 there may be four columns. After week 26 there might be 100 columns. For simplicity sake, I've included just five columns ['Name', 'Opp1', 'Opp2', 'Opp3', 'Opp4', 'Opp5'].

Using these two dataframes I'm trying to create new columns in the first dataframe (df). EXP1 (for "Expected STAT1"), EXP2, EXP3, EXP4. The expected columns are simply an average of the STAT columns based on the opponents faced during the season. For example, Edgar faced Ralph three times, Marc once and David once. The formula to calculate Edgar's EXP1 is simply:

((Ralph.STAT1 * 3) + (Marc.STAT1 * 1) + (David.STAT1 * 1) / Number_of_Contests (which is five in this example) = 100.2

import pandas as pd


data = {'Name':['Edgar', 'Ralph', 'Marc', 'David'],
        'STAT1':[100, 96, 110, 103],
       'STAT2':[116, 93, 85, 100],
       'STAT3':[56, 59, 41, 83],
       'STAT4':[55, 96, 113, 40],}

data2 = {'Name':['Edgar', 'Ralph', 'Marc', 'David'],
        'Opp1':['Ralph', 'Edgar', 'David', 'Marc'],
       'Opp2':['Ralph', 'Edgar', 'David', 'Marc'],
       'Opp3':['Marc', 'David', 'Edgar', 'Ralph'],
       'Opp4':['David', 'Marc', 'Ralph', 'Edgar'],
        'Opp5':['Ralph', 'Edgar', 'David', 'Marc'],}

df = pd.DataFrame(data)

df_Schedule = pd.DataFrame(data2)



print(df)

print(df_Schedule)

I would like the result to be something like:

data_Final = {'Name':['Edgar', 'Ralph', 'Marc', 'David'],
        'STAT1':[100, 96, 110, 103],
       'STAT2':[116, 93, 85, 100],
       'STAT3':[56, 59, 41, 83],
       'STAT4':[55, 96, 113, 40],
       'EXP1':[100.2, 102.6, 101, 105.2],
       'EXP2':[92.8, 106.6, 101.8, 92.8],
       'EXP3':[60.2, 58.4, 72.8, 47.6],
       'EXP4':[88.2, 63.6, 54.2, 98],}

df_Final = pd.DataFrame(data_Final)

print(df_Final)

Is there a way to use the scheduling dataframe to lookup the values of opponents, average them, and then create a new column based on those averages?

Upvotes: 1

Views: 81

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195553

Try:

df = df.set_index("Name")
df_Schedule = df_Schedule.set_index("Name")

for i, c in enumerate(df.filter(like="STAT"), 1):
    df[f"EXP{i}"] = df_Schedule.replace(df[c]).mean(axis=1)

print(df.reset_index())

Prints:

    Name  STAT1  STAT2  STAT3  STAT4   EXP1   EXP2  EXP3  EXP4
0  Edgar    100    116     56     55  100.2   92.8  60.2  88.2
1  Ralph     96     93     59     96  102.6  106.6  58.4  63.6
2   Marc    110     85     41    113  101.0  101.8  72.8  54.2
3  David    103    100     83     40  105.2   92.8  47.6  98.0

Upvotes: 1

Related Questions