Reputation: 353
I have two dataframes:
df1 = pd.DataFrame({'player': ['AB','AB','AB'], 'contract_length':[2,3,1], 'year': [1998,2000,2003]})
df2 = pd.DataFrame({'player': ['AB','AB','AB','AB','AB','AB'], 'year':[1998,1999,2000,2001,2002,2003],'player_value': [4,3,7,10,9,2]})
df1
player contract_length year
0 AB 2 1998
1 AB 3 2000
2 AB 1 2003
df2
player year player_value
0 AB 1998 4
1 AB 1999 3
2 AB 2000 7
3 AB 2001 10
4 AB 2002 9
5 AB 2003 2
The first dataframe lists the contracts that the player has signed. Ex: In 1998, he signed a 2 year contract.
The second dataframe lists the individual seasons and a value that I have placed for each of them.
I am trying to make a new column on df1 that sums the total player values from df2 based on the contract year and the length of the contract. For example, the first row on df1 is 1998 and 2 years. So the value would be 7, coming from the player values of 4 and 3 from 1998 and 1999 (2 year contract).
I can't seem to figure out why this isn't returning the correct results:
for i,row in df1.iterrows():
year_list = list(range(row['year'],((row['year'])+(row['contract_length']))))
player = row['player']
df = pd.DataFrame(columns=['player_value'])
for year in year_list:
player_value = df2.loc[(df2['player']==player) & (df2['year'] == year),['player_value']]
df1['contract_value'] = sum(df['player_value'])
This is returned:
player contract_length year contract_value
0 AB 2 1998 0
1 AB 3 2000 0
2 AB 1 2003 0
When it should be:
player contract_length year contract_value
0 AB 2 1998 7
1 AB 3 2000 26
2 AB 1 2003 2
There are no errors returned. Just the zeros in the last column.
Upvotes: 1
Views: 469
Reputation: 75080
Consider repeating
the dataframe according to contract_length
then assigning
another column which adds
the years based on the group and then merging with the second:
final = (df1.loc[df1.index.repeat(df1['contract_length'])]
.assign(year1 = lambda x: x['year']+x.groupby('year').cumcount())
.merge(df2, left_on = ['player','year1'],right_on = ['player','year']
,suffixes = ('','_y')).groupby(['player','contract_length','year']
,sort=False,as_index=False)['player_value'].sum())
player contract_length year player_value
0 AB 2 1998 7
1 AB 3 2000 26
2 AB 1 2003 2
breaking this down to 2 steps:
m = df1.loc[df1.index.repeat(df1['contract_length'])].assign(year1 = lambda x:
x['year']+x.groupby('year').cumcount())
final1 = (m.merge(df2,left_on = ['player','year1'],right_on=['player','year']
,suffixes=('','_y').groupby(['player','contract_length','year']
,sort=False,as_index=False)['player_value'].sum())
player contract_length year player_value
0 AB 2 1998 7
1 AB 3 2000 26
2 AB 1 2003 2
Just so you know what we are merging the second dataframe with:
print(m)
player contract_length year year1
0 AB 2 1998 1998
0 AB 2 1998 1999
1 AB 3 2000 2000
1 AB 3 2000 2001
1 AB 3 2000 2002
2 AB 1 2003 2003
Upvotes: 1
Reputation: 195408
Another attempt, using .explode()
:
df1['contract_value'] = pd.merge(
df1.assign(years=df1.apply(lambda x: [*range(x['year'], x['year'] + x['contract_length'])] ,axis=1)).explode('years'),
df2, left_on=['player', 'years'], right_on=['player', 'year']
).groupby(['player', 'year_x'], as_index=False)['player_value'].sum()['player_value']
print(df1)
Prints:
player contract_length year contract_value
0 AB 2 1998 7
1 AB 3 2000 26
2 AB 1 2003 2
Upvotes: 0
Reputation: 5730
Get slice for each year with contract langth and then sum palyer_value
.
import pandas as pd
df1 = pd.DataFrame({'player': ['AB','AB','AB'], 'contract_length':[2,3,1], 'year': [1998,2000,2003]})
df2 = pd.DataFrame({'player': ['AB','AB','AB','AB','AB','AB'], 'year':[1998,1999,2000,2001,2002,2003],'player_value': [4,3,7,10,9,2]})
data = []
for index, row in df1.iterrows():
contract_data = df2[(df2['year'] >= row['year']) & (df2['year'] <= row['year']+row['contract_length']-1)]
sum = contract_data['player_value'].sum()
data.append(sum)
df1['contract_value'] = data
Output:
player contract_length year contract_value
0 AB 2 1998 7
1 AB 3 2000 26
2 AB 1 2003 2
Upvotes: 2