flyingmeatball
flyingmeatball

Reputation: 7997

pandas find count between dates for list of month ends

I have a list of customers with a "start date" and and "end date". For any given time period, my goal is to find how many customers I have active. A customer is active if their start date is before x and their end date is after x. I've written a brute force version of this:

from datetime import datetime
import pandas as pd

#dates of interest
dates = ['2016-01-31','2016-02-29','2016-03-31','2016-04-30','2016-05-31']
dates = [datetime.strptime(x, '%Y-%m-%d') for x in dates]

#sample records
df = pd.DataFrame( [['A','2016-01-01','2016-04-23'],['B','2016-02-05','2016-04-30'],['C','2016-02-02','2016-05-25']],columns = ['customerId','startDate','endDate'])
df['startDate'] = pd.to_datetime(df['startDate'])
df['endDate'] = pd.to_datetime(df['endDate'])

output = []
#is there a better way to do this?
for currDate in dates:
    record_count = len(df[(df['startDate']<= currDate) & (df['endDate']>= currDate)])
    output.append([currDate,record_count])


output = pd.DataFrame(output, columns = ['date','active count'])

Is there a better way to find how many customers are active between each date of interest? Right now I'm just iterating through all the dates, but that doesn't feel very "pythonic" to me.

Any thoughts or assistance would be appreciated!

Upvotes: 0

Views: 144

Answers (1)

Nehal J Wani
Nehal J Wani

Reputation: 16629

One way would be:

In [142]: tf = pd.DataFrame({'dates': dates})
In [143]: tf['active_count'] = tf['dates'].apply(lambda x: df[(df['startDate']<= x) & (df['endDate']>= x)].count())
In [144]: tf
Out[144]: 
       dates  active_count
0 2016-01-31             1
1 2016-02-29             3
2 2016-03-31             3
3 2016-04-30             2
4 2016-05-31             0

Upvotes: 1

Related Questions