tj judge
tj judge

Reputation: 616

Substract previous row from preceding row by group WITH condition

I have a data frame

Count ID  Date 
 1     1  2020-07-09
 2     1  2020-07-11
 1     1  2020-07-21
 1     2  2020-07-04
 2     2  2020-07-09
 3     2  2020-07-18
 1     3  2020-07-02
 2     3  2020-07-05
 1     3  2020-07-19
 2     3  2020-07-22

I want to subtract the row in the date column by the row above it that has the same count BY each ID group. Those without the same count get a value of zero

Excepted output

ID  Date        Days 
1  2020-07-09   0
1  2020-07-11   0
1  2020-07-21   12  (2020-07-21 MINUS 2020-07-09)
2  2020-07-04   0
2  2020-07-09   0
2  2020-07-18   0
3  2020-07-02   0
3  2020-07-05   0
3  2020-07-19   17  (2020-07-19 MINUS 2020-07-02)
3  2020-07-22   17  (2020-07-22 MINUS 2020-07-05)

My initial thought process is to filter out Count-ID pairs, and then do the calculation.. I was wondering if there is a better workaround this>

Upvotes: 0

Views: 64

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can use groupby() to group by columns ID and Count, get the difference in days by .diff(). Fill NaN values with 0 by .fillna(), as follows:

df['Date'] = pd.to_datetime(df['Date'])   # convert to datetime if not already in datetime format

df['Days'] = df.groupby(['ID', 'Count'])['Date'].diff().dt.days.fillna(0, downcast='infer')

Result:

print(df)

   Count  ID       Date  Days
0      1   1 2020-07-09     0
1      2   1 2020-07-11     0
2      1   1 2020-07-21    12
3      1   2 2020-07-04     0
4      2   2 2020-07-09     0
5      3   2 2020-07-18     0
6      1   3 2020-07-02     0
7      2   3 2020-07-05     0
8      1   3 2020-07-19    17
9      2   3 2020-07-22    17

Upvotes: 1

Quixotic22
Quixotic22

Reputation: 2924

I like SeaBean's answer, but here is what I was working on before I saw that

df2 = df.sort_values(by = ['ID', 'Count'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2['shift1'] = df2.groupby(['ID', 'Count'])['Date'].shift(1)
df2['diff'] = (df2.Date- df2.shift1.combine_first(df2.Date)  ).dt.days

Upvotes: 0

Related Questions