Allan
Allan

Reputation: 371

Pandas: New value based on time interval

Imagine a situation where every time a client visits us, they should pay $100, unless < 30 days have passed since the last time they made a payment.

Having the day of each client's appointment, is it possible to predict the appointments in which a payment should have been made??

Take the dataframe below as an example:

pd.DataFrame({
  'dd_mm_aa': {
    0: '01/12/21',
    1: '01/12/21',
    2: '10/12/21',
    3: '10/12/21',
    4: '03/01/22',
    5: '03/01/22',
    6: '03/01/22',
    7: '15/01/22',
    8: '15/01/22',
    9: '06/02/22'},
  'name': {0: 'John',
    1: 'Mary',
    2: 'John',
    3: 'Peter',
    4: 'John',
    5: 'Mary',
    6: 'Peter',
    7: 'Mary',
    8: 'John',
    9: 'John'}
    })

I was able to add the value to be paid at the first appointment of a client with the code below.

# Adding 150 at firts appearance of a patient
df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df.loc[df.groupby('name')["dd_mm_aa"].rank() == 1, 'value'] = 100

After that, the dataframe is as follow:

dd_mm_aa     name    value
01/12/21    John    100
01/12/21    Mary    100
10/12/21    John    
10/12/21    Peter   100
03/01/22    John    
03/01/22    Mary    
03/01/22    Peter   
15/01/22    Mary    
15/01/22    John    
06/02/22    John    

However, considering the time interval of >30 days, the final output should be:

dd_mm_aa    name    value
01/12/21    John    100
01/12/21    Mary    100
10/12/21    John    
10/12/21    Peter   100
03/01/22    John    100
03/01/22    Mary    100
03/01/22    Peter   
15/01/22    Mary    
15/01/22    John    
06/02/22    John    100

Upvotes: 4

Views: 390

Answers (1)

Ming
Ming

Reputation: 479

Using iterative approach:

from datetime import timedelta

last_paid = {}
def check_paid(r):
    if r['name'] not in last_paid:
        last_paid[r['name']] = r['dd_mm_aa']
        r['value'] = 100
    elif last_paid[r['name']] + timedelta(days=30) < r['dd_mm_aa']:
        last_paid[r['name']] = r['dd_mm_aa']
        r['value'] = 100
    return r

df['dd_mm_aa'] = pd.to_datetime(df['dd_mm_aa'], dayfirst=True)
df = df.apply(check_paid, axis=1)

Output:

    dd_mm_aa    name    value
0   2021-12-01  John    100.0
1   2021-12-01  Mary    100.0
2   2021-12-10  John    NaN
3   2021-12-10  Peter   100.0
4   2022-01-03  John    100.0
5   2022-01-03  Mary    100.0
6   2022-01-03  Peter   NaN
7   2022-01-15  Mary    NaN
8   2022-01-15  John    NaN
9   2022-02-06  John    100.0

Upvotes: 3

Related Questions