Eric L
Eric L

Reputation: 97

Python pandas adding column to a dataframe based on lookup in another dataframe

I have 1 DF w/ the first and last date of the games for each NBA team. I have another DF w/ the ELO of the team before and after each game. I would like to add 2 columns to DF1 w/ the ELO of the team and at the first and last dates specified. For dates in the first column, I would like ELO1 and dates in the second column I would like ELO2. It's even better if there's some way to get the difference between the 2 ELO's directly into 1 column since that is what I'll be computing eventually.

DF1:

         first      last
team        

ATL 2017-10-18  2018-04-10

BOS 2017-10-17  2018-04-11

BRK 2017-10-18  2018-04-11

CHI 2017-10-19  2018-04-11
[...]

DF2:

          date      team       ELO_before        ELO_after
65782 2017-10-18  ATL        1648.000000  1650.308911

65783 2017-10-17  BOS        1761.000000  1753.884111

65784 2017-10-18  BRK        1427.000000  1439.104231

65785 2017-10-19  CHI        1458.000000  1464.397752

65786 2018-04-10  ATL        1406.000000  1411.729285
[...]

Thanks in Advance!

Edit - The resulting data frame I want would look like:

DF3:

       first        last      ELO_before    ELO_after
team        

ATL 2017-10-18  2018-04-10   1648.000000   1411.729285

BOS 2017-10-17  2018-04-11   1761.000000   [Elo2 for last game]

BRK 2017-10-18  2018-04-11   1427.000000   [Elo2 for last game]

CHI 2017-10-19  2018-04-11   1458.000000   [Elo2 for last game]

Upvotes: 3

Views: 7548

Answers (1)

nijm
nijm

Reputation: 2218

You can use pandas.DataFrame.merge for this:

import pandas as pd

# frames from the question
df1 = pd.DataFrame(data={
  'team': ['ATL', 'BOS', 'BRK', 'CHI'],
  'first': ['2017-10-18', '2017-10-17', '2017-10-18', '2017-10-19'],
  'last': ['2018-04-10', '2018-04-11', '2018-04-11', '2018-04-11']
}).set_index('team')

df2 = pd.DataFrame(data={
  'date': ['2017-10-18', '2017-10-17', '2017-10-18', '2017-10-19', '2018-04-10'],
  'team': ['ATL', 'BOS', 'BRK', 'CHI', 'ATL'],
  'ELO_before': [1648.0, 1761.0, 1427.0, 1458.0, 1406.0],
  'ELO_after': [1650.308911, 1753.884111, 1439.104231, 1464.397752, 1411.729285]
})

# merge on first and last
df1.reset_index(inplace=True)
df3 = df1.merge(df2.drop('ELO_after', axis=1), how='left', left_on=['team', 'first'], right_on=['team', 'date']).drop(['date'], axis=1)
df3 = df3.merge(df2.drop('ELO_before', axis=1), how='left', left_on=['team', 'last'], right_on=['team', 'date']).drop(['date'], axis=1)

# calculate the differences
df3['ELO_difference'] = df3['ELO_after'] - df3['ELO_before']
df3.set_index('team', inplace=True)

Upvotes: 3

Related Questions