Reputation: 1694
I'd like to subtract dates if the next row's id is the same. I'm able to subtract dates, but stuck on creating the condition to check if the next row has the same id.
d = {'date':['2021-01', '2020-01', '2020-05', '2021-01'], 'id':['a', 'a', 'b', 'b']}
df = pd.DataFrame(data=d)
date id
2021-01 a
2020-01 a
2020-05 b
2021-01 b
My code
df = df.sort_values(by=['id', 'date'])
df['date_diff'] = pd.to_datetime(df['date']) - pd.to_datetime(df['date'].shift())
result
date id date_diff
2020-01 a NaT
2021-01 a 366 days
2020-05 b -245 days
2021-01 b 245 days
Expected result should as below, which the dates only be subtracted when the ids are the same.
Upvotes: 1
Views: 527
Reputation: 323236
Chain with groupby
df['date'] = pd.to_datetime(df['date'])
df['date_diff'] = df.groupby('id')['date'].diff()
Upvotes: 2
Reputation: 26676
df['date']=pd.to_datetime(df['date'])
df['date_diff']=df.groupby('id')['date'].diff()
Upvotes: 2