Tartaglia
Tartaglia

Reputation: 1041

Add Missing Date Index with default values

I have a pandas dataframe with an index representing the data (in monthly format) and multiple columns with numeric data. Min Example is below:

 dict1 = [{'var0': 45, 'var1': 3, 'var2': 2},
     {'var0': 32, 'var1': 4, 'var2': 4},
     {'var0': 23, 'var1': 5, 'var2': 8},
     {'var0': 22, 'var1': 2, 'var2': 12},]
 df = pd.DataFrame(dict1, index=['2016-08', '2016-09','2016-11','2016-12'])

Some of the months are missing however, that is, notice how the index jumps from Sep to Nov. I would like to fill all of the missing months such that the new dataframe contains additional rows with that month as an index and zeros in the respective row, that is:

  dict1 = [{'var0': 45, 'var1': 3, 'var2': 2},
     {'var0': 32, 'var1': 4, 'var2': 4},
     {'var0': 23, 'var1': 5, 'var2': 8},
     {'var0':  0, 'var1': 0, 'var2': 0},
     {'var0': 22, 'var1': 2, 'var2': 12},]
  df = pd.DataFrame(dict1, index=['2016-08'', '2016-09', '2016-09','2016-11','2016-12'])

Can anyone recommend an approach?

Upvotes: 1

Views: 315

Answers (1)

jezrael
jezrael

Reputation: 862471

Create DatetimeIndex and use DataFrame.asfreq:

df.index = pd.to_datetime(df.index)
df = df.asfreq('MS', fill_value=0)

Or DataFrame.reindex with pandas.date_range:

df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq='MS'), fill_value=0)

print(df)
            var0  var1  var2
2016-08-01    45     3     2
2016-09-01    32     4     4
2016-10-01     0     0     0
2016-11-01    23     5     8
2016-12-01    22     2    12

Solution with month period - creating DatetimeIndex.to_period with pandas.period_range:

df.index = pd.to_datetime(df.index).to_period('M')
df = df.reindex(pd.period_range(df.index.min(), df.index.max(), freq='M'), fill_value=0)
print(df)
         var0  var1  var2
2016-08    45     3     2
2016-09    32     4     4
2016-10     0     0     0
2016-11    23     5     8
2016-12    22     2    12

Last if necessary convert to strings YY-MM add DatetimeIndex.strftime:

df.index = df.index.strftime('%Y-%m')
print(df)
         var0  var1  var2
2016-08    45     3     2
2016-09    32     4     4
2016-10     0     0     0
2016-11    23     5     8
2016-12    22     2    12

Upvotes: 2

Related Questions