Pandas fill-in-zero for by missing dates *defined by* group

Question

Question: Using pandas -- how to efficiently fill-in missing dates with zero values, with monthly (e.g. last day indexed) frequency, relative to the min/max date values per group?

Edit do not assume input dates to correspond to the last day in its month. To deal with this issue, add the following line to the proposed answers below:

df.date = df.date + pd.offsets.MonthEnd(0)

Without this fix, filled in values with freq='M' can results in NA's...!

Note: pandas version 0.24.2

Example input:

data = [{'name': 'A', 'date': '2019-01-01', 'val': 10},
{'name': 'A', 'date': '2019-04-30', 'val': 2},
{'name': 'B', 'date': '2019-02-15', 'val': 6},
{'name': 'B', 'date': '2019-05-01', 'val': 5}]

df = pd.DataFrame(data)

         date name  val
0  2019-01-01    A   10
1  2019-04-30    A    2
2  2019-02-15    B    6
3  2019-05-01    B    5

Note that the dates in the input are not necessarily the first or last day of their corresponding month.

Example desired output

         date name  val
0  2019-01-31    A   10
1  2019-02-28    A    0
2  2019-03-31    A    0
3  2019-04-30    A    2
4  2019-02-28    B    6
5  2019-03-31    B    0
6  2019-04-30    B    0
7  2019-05-31    B    5

Attempts:

The following works at the index level but fills everything with NA:

df['date'] = pd.to_datetime(df['date'])

dg = df.groupby('name').apply(lambda x: x.reindex(pd.date_range(min(x.date), max(x.date), freq='M')))

Also:

Pandas filling missing dates and values within group

The above link's answer appear not be not relative to for each group, but rather the entire dataset's min/max date values.

Andy L. · Accepted Answer

I would use groupby, resample and asfreq (edit: as you updated the question on non-MonthEnd dates. I added pd.offsets.MonthEnd as you proposed)

df.date = df.date + pd.offsets.MonthEnd(0) 
(df.set_index('date').groupby('name').resample('M')
                     .asfreq(fill_value=0).drop('name',1)
                     .reset_index())

Out[550]:
  name       date  val
0    A 2019-01-31   10
1    A 2019-02-28    0
2    A 2019-03-31    0
3    A 2019-04-30    2
4    B 2019-02-28    6
5    B 2019-03-31    0
6    B 2019-04-30    0
7    B 2019-05-31    5

Pandas fill-in-zero for by missing dates defined by group

Answers (2)

Related Questions