Reputation: 93
I want to add dates and days that are contained in a column after grouping by an ID column.
The following generates an example df:
df = pd.DataFrame(
{
"ID":[1,1,1,1,2,2,2,3,3,3,3,3,3],
"Date":list(pd.date_range("2018-1-1", "2018-4-10", periods=4)) + list(pd.date_range("2018-6-6", "2018-7-30", periods=3)) + list(pd.date_range("2018-1-1", "2020-1-1", periods=6))
}
)
df['date_intervals'] = df.groupby('ID').Date.diff()
df['new_date_intermediate'] = df.date_intervals.mask(pd.isnull, df['Date'])
Grouped by the ID field, I want a cumulative sum returning dates.
For example, for ID = 1, I want a vector of the first row + the second row, this would be 2018-01-01 + 33 days, followed by the result of that sum plus the third row which is adding another 33 days.
Upvotes: 2
Views: 573
Reputation: 323306
You can just do cumsum
df['new_date_intermediate'] = df.groupby('ID')['new_date_intermediate'].apply(lambda x :x.cumsum())
df
ID Date date_intervals new_date_intermediate
0 1 2018-01-01 NaT 2018-01-01 00:00:00
1 1 2018-02-03 33 days 2018-02-03 00:00:00
2 1 2018-03-08 33 days 2018-03-08 00:00:00
3 1 2018-04-10 33 days 2018-04-10 00:00:00
4 2 2018-06-06 NaT 2018-06-06 00:00:00
5 2 2018-07-03 27 days 2018-07-03 00:00:00
6 2 2018-07-30 27 days 2018-07-30 00:00:00
7 3 2018-01-01 NaT 2018-01-01 00:00:00
8 3 2018-05-27 146 days 2018-05-27 00:00:00
9 3 2018-10-20 146 days 2018-10-20 00:00:00
10 3 2019-03-15 146 days 2019-03-15 00:00:00
11 3 2019-08-08 146 days 2019-08-08 00:00:00
12 3 2020-01-01 146 days 2020-01-01 00:00:00
Upvotes: 1