Reputation: 363
I have medical transaction data that looks like this:
id date amt code
124 1/14/12 135 P
124 1/15/12 135 P
124 1/16/12 135 P
124 1/17/12 135 R
124 2/12/12 135 P
124 2/14/12 135 R
124 2/29/12 142 P
124 2/30/12 159 P
192 2/12/12 922 P
192 2/13/12 922 R
192 2/25/12 124 P
192 2/26/12 40 P
135 2/17/12 721 P
135 2/20/12 100 P
The 'P' code represents a payment, and the 'R' code represents a return. I need to identify patients who have more than 2 payments in the last 10 days that sum to more than $100 dollars. The trick is that I need to do this on a day-by-day basis, so an individual can trigger this logic multiple times over the month (e.g., two transactions starting 1/15/12, and three transactions starting 1/16/12 should each trigger once, another 2 transactions triggered this logic on 2/30/12).
What makes this problem more complex is that I have to ignore patients who have returned payment codes. So if patient '124' had 2 payments on 1/15/12 and a return on 1/17/12, they should not be flagged but they should be flagged once each for their transactions on 1/14/12 & 1/15/12 and on 2/29/12 and 2/30/12. The final data should look like this:
id flag_date count amt
124 1/15/12 2 270
124 1/16/12 3 405
124 2/30/12 2 301
192 2/26/12 2 164
135 2/20/12 2 821
Upvotes: 1
Views: 678
Reputation: 294258
I had to make sure we had a datetime column
df.date = pd.to_datetime(df.date)
df.sort_values('date') \
.assign(code=df.code.eq('P')) \
.groupby('id').rolling('10d', on='date')[['amt', 'code']].sum() \
.query('code >= 2 and amt > 100').reset_index()
id date amt code
0 124 2012-01-15 270.0 2.0
1 124 2012-01-16 405.0 3.0
2 124 2012-01-17 540.0 3.0
3 124 2012-02-29 301.0 2.0
4 135 2012-02-20 821.0 2.0
5 192 2012-02-26 164.0 2.0
Upvotes: 2