Reputation: 817
I have a DataFrame:
import pandas as pd
df = pd.DataFrame({'Customer': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B','B','B','B','B', 'B'],
'Date': ['1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021','8/1/2021', '1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021', '8/1/2021'],
'Amt': [0, 10, 10, 10, 0, 0, 0, 0, 0, 0, 10, 10, 0, 0, 10, 0]})
df
Customer Date Amt
0 A 1/1/2021 0
1 A 2/1/2021 10
2 A 3/1/2021 10
3 A 4/1/2021 10
4 A 5/1/2021 0
5 A 6/1/2021 0
6 A 7/1/2021 0
7 A 8/1/2021 0
8 B 1/1/2021 0
9 B 2/1/2021 0
10 B 3/1/2021 10
11 B 4/1/2021 10
12 B 5/1/2021 0
13 B 6/1/2021 0
14 B 7/1/2021 10
15 B 8/1/2021 0
What I'm trying to calculate is the amount of time between purchases, when there's been a gap. So, for customer A, there is no gap...they just churned and have not come back. Moreover, for Customer B, they enter at 3/1/21 (i.e., first purchase), and would have a two-month gap for 5/1-6/1.
How would I go about calculating this by customer?
edit: Desired output: by customer, gap (i.e., number of periods, in this case months)
Upvotes: 0
Views: 70
Reputation: 323226
We can do groupby
with ffill
and bfill
s = df.Amt.mask(df.Amt==0)
g = s.groupby(df['Customer'])
checker = g.ffill().notna() & g.bfill().notna() & s.isna()
checker.groupby(df['Customer']).sum()
Out[244]:
Customer
A 0
B 2
Name: Amt, dtype: int64
Update
x = checker.groupby([df['Customer'],checker.ne(True).cumsum()]).sum()
x[x.ne(0).idxmax()]
2
x[x.ne(0).groupby(level=0).idxmax()]
Customer Amt
A 1 0
B 12 2
Name: Amt, dtype: int64
Upvotes: 1
Reputation: 453
see if below example can help:
def diffcalc (amt):
last = 0
diff = 0
for i in range (0, len(amt), 1):
if amt[i] > 0:
diff = (i - last-1)
last = i
return diff
amt_a = [0, 10, 10, 10, 0, 0, 0, 0]
amt_b = [0, 10, 10, 0, 0, 10, 0, 0]
print('a', diffcalc(amt_a))
print('b', diffcalc(amt_b))
Upvotes: 0