Chris Withers
Chris Withers

Reputation: 11111

resampling a pandas dataframe from almost-weekly to daily

What's the most succinct way to resample this dataframe:

>>> uneven = pd.DataFrame({'a': [0, 12, 19]}, index=pd.DatetimeIndex(['2020-12-08', '2020-12-20', '2020-12-27']))
>>> print(uneven)
             a
2020-12-08   0
2020-12-20  12
2020-12-27  19

...into this dataframe:

>>> daily = pd.DataFrame({'a': range(20)}, index=pd.date_range('2020-12-08', periods=3*7-1, freq='D'))
>>> print(daily)
             a
2020-12-08   0
2020-12-09   1
...
2020-12-19  11
2020-12-20  12
2020-12-21  13
...
2020-12-27  19

NB: 12 days between the 8th and 20th Dec, 7 days between the 20th and 27th.

Also, to give clarity of the kind of interpolation/resampling I want to do:

>>> print(daily.diff())
              a
2020-12-08  NaN
2020-12-09  1.0
2020-12-10  1.0
...
2020-12-19  1.0
2020-12-20  1.0
2020-12-21  1.0
...
2020-12-27  1.0

The actual data is hierarchical and has multiple columns, but I wanted to start with something I could get my head around:

                      first_dose  second_dose
date       areaCode                          
2020-12-08 E92000001         0.0          0.0
           N92000002         0.0          0.0
           S92000003         0.0          0.0
           W92000004         0.0          0.0
2020-12-20 E92000001    574829.0          0.0
           N92000002     16068.0          0.0
           S92000003     60333.0          0.0
           W92000004     24056.0          0.0
2020-12-27 E92000001    267809.0          0.0
           N92000002     14948.0          0.0
           S92000003     34535.0          0.0
           W92000004     12495.0          0.0
2021-01-03 E92000001    330037.0      20660.0
           N92000002      9669.0       1271.0
           S92000003     21446.0         44.0
           W92000004     14205.0         27.0

Upvotes: 1

Views: 223

Answers (1)

jezrael
jezrael

Reputation: 862711

I think you need:

df = df.reset_index('areaCode').groupby('areaCode')[['first_dose','second_dose']].resample('D').interpolate()
print (df)
                         first_dose  second_dose
areaCode  date                                  
E92000001 2020-12-08       0.000000     0.000000
          2020-12-09   47902.416667     0.000000
          2020-12-10   95804.833333     0.000000
          2020-12-11  143707.250000     0.000000
          2020-12-12  191609.666667     0.000000
                            ...          ...
W92000004 2020-12-30   13227.857143    11.571429
          2020-12-31   13472.142857    15.428571
          2021-01-01   13716.428571    19.285714
          2021-01-02   13960.714286    23.142857
          2021-01-03   14205.000000    27.000000

[108 rows x 2 columns]

Upvotes: 1

Related Questions