Hello lad
Hello lad

Reputation: 18790

How to interpolate grouped time series in a Pandas dataframe

I have data in type pd.DataFrame which looks like the following:

type  date  sum
A     Jan-1 1
A     Jan-3 2
B     Feb-1 1
B     Feb-2 3
B     Feb-5 6

The task is to build a continuous time series for each type (the missing date should be filled with 0).

The expected result is:

type  date  sum
A     Jan-1 1
A     Jan-2 0
A     Jan-3 2
B     Feb-1 1
B     Feb-2 3
B     Feb-3 0
B     Feb-4 0
B     Feb-5 6

Is it possible to do that with pandas or other Python tools?

The real dataset has millions of rows.

Upvotes: 0

Views: 811

Answers (1)

Ted Petrou
Ted Petrou

Reputation: 61967

You first must change your date to a datetime and put that column in the index to take advantage of resampling and then you can convert the date back to its original format

# change to datetime
df['date'] =pd.to_datetime(df.date, format="%b-%d")
df = df.set_index('date')

# resample to fill in missing dates
df1 = df.groupby('type').resample('d')['sum'].asfreq().fillna(0)
df1 = df1.reset_index()

# change back to original date format
df1['date'] = df1.date.dt.strftime('%b-%d')

output

  type    date  sum
0    A  Jan-01  1.0
1    A  Jan-02  0.0
2    A  Jan-03  2.0
3    B  Feb-01  1.0
4    B  Feb-02  3.0
5    B  Feb-03  0.0
6    B  Feb-04  0.0
7    B  Feb-05  6.0

Upvotes: 2

Related Questions