sunless
sunless

Reputation: 637

Fill up missing observations at dates in interval in a pandas dataframe

Let us say I have the following pandas dataframe:

+---------------------+---------+-------+-----+
| observed_cats_count |  year   | month | day |
+---------------------+---------+-------+-----+
|                   2 |    2019 |    10 |  19 |
|                   3 |    2019 |    10 |  18 |
|                   5 |    2019 |    10 |  16 |
+---------------------+---------+-------+-----+

And two boundary dates, say 2019-10-15 and 2019-10-20, and I know that all missing observations should have observed_cats_count=0.

How can insert a row for all missing dates in the interval and get the following dataframe:

+---------------------+---------+-------+-----+
| observed_cats_count |  year   | month | day |
+---------------------+---------+-------+-----+
|                   0 |    2019 |    10 |  20 |
|                   2 |    2019 |    10 |  19 |
|                   3 |    2019 |    10 |  18 |
|                   0 |    2019 |    10 |  17 |
|                   5 |    2019 |    10 |  16 |
|                   0 |    2019 |    10 |  15 |
+---------------------+---------+-------+-----+

Upvotes: 1

Views: 116

Answers (2)

Andy L.
Andy L.

Reputation: 25239

I would use pd.date_range to construct a new dataframe and merge back to df and fillna

dates = pd.date_range('2019-10-20', '2019-10-15', freq='-1D')
df1 = pd.DataFrame({'year': dates.year, 'month': dates.month, 'day': dates.day})
df2 = df1.merge(df, how='left').fillna(0)

Out[413]:
   year  month  day  observed_cats_count
0  2019     10   20                  0.0
1  2019     10   19                  2.0
2  2019     10   18                  3.0
3  2019     10   17                  0.0
4  2019     10   16                  5.0
5  2019     10   15                  0.0

Upvotes: 2

jezrael
jezrael

Reputation: 862451

Idea is create DatetimeIndex by to_datetime with DataFrame.set_index, so possible use Series.reindex by all datetimes created by date_range, then create columns from DatetimeIndex, sorting it by DataFrame.sort_index and last remove it by DataFrame.reset_index with drop=True:

rng = pd.date_range('2019-10-15','2019-10-20')
df = (df.set_index(pd.to_datetime(df[['year','month','day']]))['observed_cats_count']
       .reindex(rng, fill_value=0).to_frame()
       .assign(year=lambda x: x.index.year, 
               month=lambda x: x.index.month, 
               day=lambda x: x.index.day)
       .sort_index(ascending=False)
       .reset_index(drop=True))
print (df)
   observed_cats_count  year  month  day
0                    0  2019     10   20
1                    2  2019     10   19
2                    3  2019     10   18
3                    0  2019     10   17
4                    5  2019     10   16
5                    0  2019     10   15

Upvotes: 1

Related Questions