RyanG73
RyanG73

Reputation: 73

Find difference in dates among entries across multiple columns

df1=

Date     Team1     Team2
6/1      Boston    New York
6/13     New York  Boston
6/27     Boston    New York

I am trying to calculate the number of days since the last time Boston appeared in either column, but I can only figure out how to look it up within one column, using df1['Days since Boston played'] = df1.groupby('Team1')['Date'].diff().fillna(0)

What I would like the output to be:

Date     Team1     Team2      Days since Boston played
6/1      Boston    New York   0
6/13     New York  Boston     12
6/27     Boston    New York   14

EDIT - expanding the dataframe to learn how this can be applied to all teams, not just one What I would like the output to be:

Date     Team1     Team2      Days since **Team1** played
6/1      Boston    New York   0
6/13     New York  Chicago    12
6/27     Boston    New York   14
6/28     Chicago   Boston     15

Upvotes: 1

Views: 45

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Use isin to check if Boston is present in either team1 or team2 and find timedelta

df['Date'] = pd.to_datetime(df['Date'], format = '%m/%d')

df.loc[df.isin(['Boston']).any(1),'Days since Boston played'] = df.loc[df.isin(['Boston']).any(1), 'Date'].diff().dt.days


    Date        Team1       Team2       Days since Boston played
0   1900-06-01  Boston      New York    NaN
1   1900-06-13  New York    Boston      12.0
2   1900-06-27  Boston      New York    14.0

If you want the date column back in original format, you can use strftime

df['Date'] = df['Date'].dt.strftime('%m/%d')

    Date    Team1       Team2       Days since Boston played
0   06/01   Boston      New York    NaN
1   06/13   New York    Boston      12.0
2   06/27   Boston      New York    14.0

Upvotes: 2

Meow
Meow

Reputation: 1267

You can do a groupby on Team1 and then do a difference on the dates:

# Note: you should post a creatable example in your post next time 
data = {
    'Date': ['2018-06-01', '2018-06-13', '2018-06-27'],
    'Team1':['Boston', 'New York', 'Boston'], 
    'Team2':['New York', 'Boston', 'New York']
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df['Time between games'] = df.groupby('Team1')['Date'].diff().fillna(0)

This will actually give you the difference between games for all home teams.

Upvotes: 0

Related Questions