Summing over specific time windows in python pandas

Question

I have medical transaction data that looks like this:

id   date   amt code
124 1/14/12 135 P
124 1/15/12 135 P
124 1/16/12 135 P
124 1/17/12 135 R
124 2/12/12 135 P
124 2/14/12 135 R
124 2/29/12 142 P
124 2/30/12 159 P
192 2/12/12 922 P
192 2/13/12 922 R
192 2/25/12 124 P
192 2/26/12  40 P
135 2/17/12 721 P
135 2/20/12 100 P

The 'P' code represents a payment, and the 'R' code represents a return. I need to identify patients who have more than 2 payments in the last 10 days that sum to more than $100 dollars. The trick is that I need to do this on a day-by-day basis, so an individual can trigger this logic multiple times over the month (e.g., two transactions starting 1/15/12, and three transactions starting 1/16/12 should each trigger once, another 2 transactions triggered this logic on 2/30/12).

What makes this problem more complex is that I have to ignore patients who have returned payment codes. So if patient '124' had 2 payments on 1/15/12 and a return on 1/17/12, they should not be flagged but they should be flagged once each for their transactions on 1/14/12 & 1/15/12 and on 2/29/12 and 2/30/12. The final data should look like this:

id flag_date count amt
124 1/15/12    2   270 
124 1/16/12    3   405
124 2/30/12    2   301
192 2/26/12    2   164
135 2/20/12    2   821

piRSquared · Accepted Answer

I had to make sure we had a datetime column

df.date = pd.to_datetime(df.date)

df.sort_values('date') \
    .assign(code=df.code.eq('P')) \
    .groupby('id').rolling('10d', on='date')[['amt', 'code']].sum() \
    .query('code >= 2 and amt > 100').reset_index()

    id       date    amt  code
0  124 2012-01-15  270.0   2.0
1  124 2012-01-16  405.0   3.0
2  124 2012-01-17  540.0   3.0
3  124 2012-02-29  301.0   2.0
4  135 2012-02-20  821.0   2.0
5  192 2012-02-26  164.0   2.0

Summing over specific time windows in python pandas

Answers (1)

Related Questions