Reputation: 353

Pulling values from two dataframes based on conditions

I have two dataframes:

df1 = pd.DataFrame({'player': ['AB','AB','AB'], 'contract_length':[2,3,1], 'year': [1998,2000,2003]})
df2 = pd.DataFrame({'player': ['AB','AB','AB','AB','AB','AB'], 'year':[1998,1999,2000,2001,2002,2003],'player_value': [4,3,7,10,9,2]})

df1
  player    contract_length     year
0   AB            2             1998
1   AB            3             2000
2   AB            1             2003

df2
    player  year    player_value
0   AB      1998    4
1   AB      1999    3
2   AB      2000    7
3   AB      2001    10
4   AB      2002    9
5   AB      2003    2

The first dataframe lists the contracts that the player has signed. Ex: In 1998, he signed a 2 year contract.

The second dataframe lists the individual seasons and a value that I have placed for each of them.

I am trying to make a new column on df1 that sums the total player values from df2 based on the contract year and the length of the contract. For example, the first row on df1 is 1998 and 2 years. So the value would be 7, coming from the player values of 4 and 3 from 1998 and 1999 (2 year contract).

I can't seem to figure out why this isn't returning the correct results:

for i,row in df1.iterrows():
    year_list = list(range(row['year'],((row['year'])+(row['contract_length']))))
    player = row['player']
    df = pd.DataFrame(columns=['player_value'])
    for year in year_list:
        player_value = df2.loc[(df2['player']==player) & (df2['year'] == year),['player_value']]
        df1['contract_value'] = sum(df['player_value'])

This is returned:

player  contract_length year    contract_value
0   AB     2            1998    0
1   AB     3            2000    0
2   AB     1            2003    0

When it should be:

player  contract_length year    contract_value
0   AB     2            1998    7
1   AB     3            2000    26
2   AB     1            2003    2

There are no errors returned. Just the zeros in the last column.

Upvotes: 1

Answers (3)

anky

Reputation: 75150

Consider repeating the dataframe according to contract_length then assigning another column which adds the years based on the group and then merging with the second:

final = (df1.loc[df1.index.repeat(df1['contract_length'])]
        .assign(year1 = lambda x: x['year']+x.groupby('year').cumcount())
        .merge(df2, left_on = ['player','year1'],right_on = ['player','year']
        ,suffixes = ('','_y')).groupby(['player','contract_length','year']
        ,sort=False,as_index=False)['player_value'].sum())

  player  contract_length  year  player_value
0     AB                2  1998             7
1     AB                3  2000            26
2     AB                1  2003             2

breaking this down to 2 steps:

m = df1.loc[df1.index.repeat(df1['contract_length'])].assign(year1 = lambda x:
             x['year']+x.groupby('year').cumcount())
final1 = (m.merge(df2,left_on = ['player','year1'],right_on=['player','year']
         ,suffixes=('','_y').groupby(['player','contract_length','year']
          ,sort=False,as_index=False)['player_value'].sum())

   player  contract_length  year  player_value
0     AB                2  1998             7
1     AB                3  2000            26
2     AB                1  2003             2

Just so you know what we are merging the second dataframe with:

print(m)

  player  contract_length  year  year1
0     AB                2  1998   1998
0     AB                2  1998   1999
1     AB                3  2000   2000
1     AB                3  2000   2001
1     AB                3  2000   2002
2     AB                1  2003   2003

Upvotes: 1

Andrej Kesely

Reputation: 195613

Another attempt, using .explode():

df1['contract_value'] = pd.merge(
        df1.assign(years=df1.apply(lambda x: [*range(x['year'], x['year'] + x['contract_length'])] ,axis=1)).explode('years'),
        df2, left_on=['player', 'years'], right_on=['player', 'year']
    ).groupby(['player', 'year_x'], as_index=False)['player_value'].sum()['player_value']

print(df1)

Prints:

  player  contract_length  year  contract_value
0     AB                2  1998               7
1     AB                3  2000              26
2     AB                1  2003               2

Upvotes: 0

Zaraki Kenpachi

Reputation: 5740

Get slice for each year with contract langth and then sum palyer_value.

import pandas as pd

df1 = pd.DataFrame({'player': ['AB','AB','AB'], 'contract_length':[2,3,1], 'year': [1998,2000,2003]})
df2 = pd.DataFrame({'player': ['AB','AB','AB','AB','AB','AB'], 'year':[1998,1999,2000,2001,2002,2003],'player_value': [4,3,7,10,9,2]})

data = []
for index, row in df1.iterrows():
    contract_data = df2[(df2['year'] >= row['year']) & (df2['year'] <= row['year']+row['contract_length']-1)]
    sum = contract_data['player_value'].sum()
    data.append(sum)

df1['contract_value'] = data

Output:

  player  contract_length  year  contract_value
0     AB                2  1998               7
1     AB                3  2000              26
2     AB                1  2003               2

Upvotes: 2

Pulling values from two dataframes based on conditions

Answers (3)

Related Questions