Reputation: 18790
I have data in type pd.DataFrame
which looks like the following:
type date sum
A Jan-1 1
A Jan-3 2
B Feb-1 1
B Feb-2 3
B Feb-5 6
The task is to build a continuous time series for each type (the missing date should be filled with 0).
The expected result is:
type date sum
A Jan-1 1
A Jan-2 0
A Jan-3 2
B Feb-1 1
B Feb-2 3
B Feb-3 0
B Feb-4 0
B Feb-5 6
Is it possible to do that with pandas
or other Python tools?
The real dataset has millions of rows.
Upvotes: 0
Views: 811
Reputation: 61967
You first must change your date to a datetime and put that column in the index to take advantage of resampling and then you can convert the date back to its original format
# change to datetime
df['date'] =pd.to_datetime(df.date, format="%b-%d")
df = df.set_index('date')
# resample to fill in missing dates
df1 = df.groupby('type').resample('d')['sum'].asfreq().fillna(0)
df1 = df1.reset_index()
# change back to original date format
df1['date'] = df1.date.dt.strftime('%b-%d')
type date sum
0 A Jan-01 1.0
1 A Jan-02 0.0
2 A Jan-03 2.0
3 B Feb-01 1.0
4 B Feb-02 3.0
5 B Feb-03 0.0
6 B Feb-04 0.0
7 B Feb-05 6.0
Upvotes: 2