Fill up missing observations at dates in interval in a pandas dataframe

Question

Let us say I have the following pandas dataframe:

+---------------------+---------+-------+-----+
| observed_cats_count |  year   | month | day |
+---------------------+---------+-------+-----+
|                   2 |    2019 |    10 |  19 |
|                   3 |    2019 |    10 |  18 |
|                   5 |    2019 |    10 |  16 |
+---------------------+---------+-------+-----+

And two boundary dates, say 2019-10-15 and 2019-10-20, and I know that all missing observations should have observed_cats_count=0.

How can insert a row for all missing dates in the interval and get the following dataframe:

+---------------------+---------+-------+-----+
| observed_cats_count |  year   | month | day |
+---------------------+---------+-------+-----+
|                   0 |    2019 |    10 |  20 |
|                   2 |    2019 |    10 |  19 |
|                   3 |    2019 |    10 |  18 |
|                   0 |    2019 |    10 |  17 |
|                   5 |    2019 |    10 |  16 |
|                   0 |    2019 |    10 |  15 |
+---------------------+---------+-------+-----+

jezrael · Accepted Answer

Idea is create DatetimeIndex by to_datetime with DataFrame.set_index, so possible use Series.reindex by all datetimes created by date_range, then create columns from DatetimeIndex, sorting it by DataFrame.sort_index and last remove it by DataFrame.reset_index with drop=True:

rng = pd.date_range('2019-10-15','2019-10-20')
df = (df.set_index(pd.to_datetime(df[['year','month','day']]))['observed_cats_count']
       .reindex(rng, fill_value=0).to_frame()
       .assign(year=lambda x: x.index.year, 
               month=lambda x: x.index.month, 
               day=lambda x: x.index.day)
       .sort_index(ascending=False)
       .reset_index(drop=True))
print (df)
   observed_cats_count  year  month  day
0                    0  2019     10   20
1                    2  2019     10   19
2                    3  2019     10   18
3                    0  2019     10   17
4                    5  2019     10   16
5                    0  2019     10   15

Fill up missing observations at dates in interval in a pandas dataframe

Answers (2)

Related Questions