Reputation: 735
Let's say I have a dataframe which looks like this:
pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4],
'date_start': ['2018-04-09','2018-05-03', '2018-06-22', '2018-09-21', '2018-12-01', '2018-12-24', '2018-02-24', '2018-03-31','2018-08-01', '2018-01-29'],
'order_start': [1,2,3,1,2,3,1,2,3,1]})
Out[67]:
category date_start order_start
0 1 2018-04-09 1
1 1 2018-05-03 2
2 1 2018-06-22 3
3 2 2018-09-21 1
4 2 2018-12-01 2
5 2 2018-12-24 3
6 3 2018-02-24 1
7 3 2018-03-31 2
8 3 2018-08-01 3
9 4 2018-01-29 1
As you can see, order_start depends of category and date_start. It is just the order of the start by category.
What would be the easiest / most efficient way to create a new column which contains the difference in days with the previous start ? I would like something which looks like this:
pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4],
'date_start': ['2018-04-09','2018-05-03', '2018-06-22', '2018-09-21', '2018-12-01', '2018-12-24', '2018-02-24', '2018-03-31','2018-08-01', '2018-01-29'],
'order_visit': [1,2,3,1,2,3,1,2,3,1],
'diff_with_last_start': [0, 30, 24, 0, 23, 56, 0, 43, 54, 0]})
Out[68]:
category date_start order_start diff_with_previous_start
0 1 2018-04-09 1 0
1 1 2018-05-03 2 25
2 1 2018-06-22 3 49
3 2 2018-09-21 1 0
4 2 2018-12-01 2 70
5 2 2018-12-24 3 56
6 3 2018-02-24 1 0
7 3 2018-03-31 2 43
8 3 2018-08-01 3 54
9 4 2018-01-29 1 0
Note 1: the difference is always 0 for the first start of each category.
Note 2: I didn't calculate the exact days difference in my example
Upvotes: 1
Views: 141
Reputation: 403198
Use groupby
and diff
:
pd.to_datetime(df['date_start']).groupby(df['category']).diff().dt.days
# or, if `date_start` is already datetime,
df.groupby('category')['date_start'].diff().dt.days
0 NaN
1 24.0
2 50.0
3 NaN
4 71.0
5 23.0
6 NaN
7 35.0
8 123.0
9 NaN
Name: date_start, dtype: float64
df['diff_with_previous_start'] = (
pd.to_datetime(df['date_start'])
.groupby(df['category'])
.diff()
.dt.days
.fillna(0, downcast='infer')
)
df
category date_start order_start diff_with_previous_start
0 1 2018-04-09 1 0
1 1 2018-05-03 2 24
2 1 2018-06-22 3 50
3 2 2018-09-21 1 0
4 2 2018-12-01 2 71
5 2 2018-12-24 3 23
6 3 2018-02-24 1 0
7 3 2018-03-31 2 35
8 3 2018-08-01 3 123
9 4 2018-01-29 1 0
Upvotes: 2