ds_user
ds_user

Reputation: 2179

Finding number of months between overlapping periods - pandas

I have the data set of customers with their policies, I am trying to find the number of months the customer is with us. (tenure)

df

cust_no poly_no start_date  end_date
1       1       2016-06-01   2016-08-31
1       2       2017-05-01   2018-05-31
1       3       2016-11-01   2018-05-31

output should look like,

cust_no no_of_months
1        22

So basically, it should get rid of the months where there is no policy and count the overlapping period once not twice. I have to do this for every customers, so group by cust_no, how can i do this?

Thanks.

Upvotes: 2

Views: 69

Answers (2)

gofvonx
gofvonx

Reputation: 1439

For multiple customers you can use groupby. Continuing with @ScottBoston's answer:

df_range = df.apply(lambda r: pd.Series(
                    pd.date_range(start=r.start_date, end=r.end_date, freq='M')
                    .values), axis=1)
df_range.groupby('cust_no').apply(lambda x: x.stack().unique().shape[0])

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

One way to do this is to create date ranges for each records, then use stack to get all the months. Next, take the unique values only to count a month only once:

s = df.apply(lambda x: pd.Series(pd.date_range(x.start_date, x.end_date, freq='M').values), axis=1)
ss = s.stack().unique()
ss.shape[0]

Output:

22

Upvotes: 1

Related Questions