Reputation: 1225
I want to create a time-series from a table that contains Start and End dates for subscriptions. The input table contains account names for subscriptions and their respective start and end dates.
input:
accountname startDate endDate
abc 31/12/2020 2/1/2021
xyz 2/1/2021 4/1/2021
pqr 4/1/2021 6/1/2021
stu 2/1/2021 3/1/2021
The value of the time-series will be the count of accountname
s that are active on that day i.e the count of accountname
s that are within their respective start and end dates.
output:
TSDate value (count of active subscriptions)
1/1/2021 1
2/1/2021 3
3/1/2021 2
4/1/2021 2
I can think of to solve this by creating a series of dates and iterating over the input each time to determine the count of records where TSdate>endDate.
What would be the right approach to solve this using python? Is there any library that I can leverage?
Upvotes: 1
Views: 2006
Reputation: 862511
I think you need first convert values to datetimes and then use concat
in list comprehension, last use Index.value_counts
:
df['startDate']= pd.to_datetime(df['startDate'], dayfirst=True)
df['endDate']= pd.to_datetime(df['endDate'], dayfirst=True)
s = (pd.concat([pd.Series(r.accountname,pd.date_range(r.startDate, r.endDate))
for r in df.itertuples()])
.index
.value_counts()
.sort_index())
print (s)
2020-12-31 1
2021-01-01 1
2021-01-02 3
2021-01-03 2
2021-01-04 2
2021-01-05 1
2021-01-06 1
dtype: int64
If need DataFrame:
df1 = s.rename_axis('date').reset_index(name='value')
Upvotes: 2