Reputation: 2179
I have the data set of customers with their policies, I am trying to find the number of months the customer is with us. (tenure)
df
cust_no poly_no start_date end_date
1 1 2016-06-01 2016-08-31
1 2 2017-05-01 2018-05-31
1 3 2016-11-01 2018-05-31
output should look like,
cust_no no_of_months
1 22
So basically, it should get rid of the months where there is no policy and count the overlapping period once not twice. I have to do this for every customers, so group by cust_no, how can i do this?
Thanks.
Upvotes: 2
Views: 69
Reputation: 1439
For multiple customers you can use groupby
. Continuing with @ScottBoston's answer:
df_range = df.apply(lambda r: pd.Series(
pd.date_range(start=r.start_date, end=r.end_date, freq='M')
.values), axis=1)
df_range.groupby('cust_no').apply(lambda x: x.stack().unique().shape[0])
Upvotes: 0
Reputation: 153460
One way to do this is to create date ranges for each records, then use stack to get all the months. Next, take the unique values only to count a month only once:
s = df.apply(lambda x: pd.Series(pd.date_range(x.start_date, x.end_date, freq='M').values), axis=1)
ss = s.stack().unique()
ss.shape[0]
Output:
22
Upvotes: 1