Create a timeseries from start and end dates

Question

I want to create a time-series from a table that contains Start and End dates for subscriptions. The input table contains account names for subscriptions and their respective start and end dates.

input:
    accountname    startDate    endDate
     abc          31/12/2020   2/1/2021
     xyz          2/1/2021     4/1/2021
     pqr          4/1/2021     6/1/2021 
     stu          2/1/2021     3/1/2021

The value of the time-series will be the count of accountnames that are active on that day i.e the count of accountnames that are within their respective start and end dates.

output:

    TSDate          value (count of active subscriptions)
    1/1/2021       1
    2/1/2021       3
    3/1/2021       2
    4/1/2021       2

I can think of to solve this by creating a series of dates and iterating over the input each time to determine the count of records where TSdate>endDate.

What would be the right approach to solve this using python? Is there any library that I can leverage?

jezrael · Accepted Answer

I think you need first convert values to datetimes and then use concat in list comprehension, last use Index.value_counts:

df['startDate']= pd.to_datetime(df['startDate'], dayfirst=True)
df['endDate']= pd.to_datetime(df['endDate'], dayfirst=True)

s = (pd.concat([pd.Series(r.accountname,pd.date_range(r.startDate, r.endDate)) 
               for r in df.itertuples()])
       .index
       .value_counts()
       .sort_index())
print (s)
2020-12-31    1
2021-01-01    1
2021-01-02    3
2021-01-03    2
2021-01-04    2
2021-01-05    1
2021-01-06    1
dtype: int64

If need DataFrame:

df1 = s.rename_axis('date').reset_index(name='value')

Create a timeseries from start and end dates

Answers (1)

Related Questions