Iterating through rows in a df and creating a new column based on those values

Question

my want to create a new relative score (column) comparing F1 drivers to theirs team-mates within given year and within given team.

My data look like:

stats_df.head()

>       driver  year    team    points
>     0 AIT 2020    Williams    0.0
>     1 ALB 2019    Red Bull    76.0
>     2 ALB 2019    AlphaTauri  16.0
>     3 ALB 2020    Red Bull    105.0
>     4 ALO 2013    Ferrari     242.0

I tired:

teams = stats_df['team'].unique()
years = stats_df['year'].unique()
drivers = stats_df['driver'].unique()

for year in years:
    for team in teams:
        team_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].sum()
        for driver in drivers:
            driver_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver]
            power_score = driver_points/(team_points/2)
            stats_df['power_score'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver] = power_score

Resulting in NaNs in the new column ('power_score').

Help would be appreciated.

Andrej Kesely · Accepted Answer

Looking at your code, you can compute team_points by using .groupby(["team", "year"]) and then simply divide points with these values:

team_points = df.groupby(["team", "year"])["points"].transform("sum")
df["power_score"] = df["points"] / (team_points / 2)
print(df)

Prints:

  driver  year        team  points  power_score
0    AIT  2020    Williams     0.0          NaN
1    ALB  2019    Red Bull    76.0          2.0
2    ALB  2019  AlphaTauri    16.0          2.0
3    ALB  2020    Red Bull   105.0          2.0
4    ALO  2013     Ferrari   242.0          2.0

Iterating through rows in a df and creating a new column based on those values

Answers (1)

Related Questions