Reputation: 5776
I have a dataframe of a season's worth of basketball scores and I would like to find the number of days between games for each team for every game they have played in the season.
Example frame:
testDateFrame = pd.DataFrame({'HomeTeam': ['HOU', 'CHI', 'DAL', 'HOU'],
'AwayTeam' : ['CHI', 'DAL', 'CHI', 'DAL'],
'HomeGameNum': [1, 2, 2, 2],
'AwayGameNum' : [1, 1, 3, 3],
'Date' : [datetime.date(2014,3,11), datetime.date(2014,3,12), datetime.date(2014,3,14), datetime.date(2014,3,15)]})
My desired output is this:
AwayGameNum AwayTeam Date HomeGameNum HomeTeam AwayRest HomeRest
1 CHI 2014-03-11 1 HOU nan nan
1 DAL 2014-03-12 2 CHI nan 0
3 CHI 2014-03-14 2 DAL 1 1
3 DAL 2014-03-15 2 HOU 0 3
Where the AwayRest, HomeRest columns are the number of days between games for the AwayTeam, HomeTeam -1
Upvotes: 1
Views: 2276
Reputation: 28936
I would adjust your data layout a bit so that it fits with Hadley Wickhams' definition of Tidy Data. This make the calculation much simpler. Eliminate the columns for AwayTeam
and HomeTeam
, and make a single column with Team
. Then crate a boolean column (HomeTeam
) for whether the team is the home team.
Note: I didn't change the AwayGameNum
and HomeGameNum
, so the numbers won't match your desired output. But the method will work.
In [34]: df
Out[34]:
AwayGameNum Team Date HomeGameNum HomeTeam
0 1 CHI 2014-03-11 1 False
1 1 HOU 2014-03-11 1 True
2 1 DAL 2014-03-12 2 False
3 1 CHI 2014-03-12 2 True
4 3 CHI 2014-03-14 2 False
5 3 DAL 2014-03-14 2 True
6 3 DAL 2014-03-15 2 False
7 3 HOU 2014-03-15 2 True
[8 rows x 5 columns]
In [62]: rest = df.groupby(['Team'])['Date'].diff() - datetime.timedelta(1)
In [63]: df['HomeRest'] = rest[df.HomeTeam]
In [64]: df['AwayRest'] = rest[~df.HomeTeam]
In [65]: df
Out[65]:
AwayGameNum Team Date HomeGameNum HomeTeam HomeRest AwayRest
0 1 CHI 2014-03-11 1 False NaT NaT
1 1 HOU 2014-03-11 1 True NaT NaT
2 1 DAL 2014-03-12 2 False NaT NaT
3 1 CHI 2014-03-12 2 True 0 days NaT
4 3 CHI 2014-03-14 2 False NaT 1 days
5 3 DAL 2014-03-14 2 True 1 days NaT
6 3 DAL 2014-03-15 2 False NaT 0 days
7 3 HOU 2014-03-15 2 True 3 days NaT
[8 rows x 7 columns]
Upvotes: 5