Pandas resample by groups with duplicate datetimes

Question

Lots of similar questions on here, but I couldn't find any that actually had observations with the same datetime. A minimum non-working example would be:

df = pd.DataFrame(
    {"Date": np.tile([pd.Series(["2016-01", "2016-03"])], 2)[0],
     "Group": [1,1,2,2],
     "Obs":[1,2,5,6]})

Now I'd like to linearly interpolate the value for February 2016 by group, so the required output is

    Date    Group   Obs
    2016-01     1       1
    2016-02     1     1.5
    2016-03     1       2
    2016-01     2       5
    2016-02     2     5.5
    2016-03     2       6

My understanding is that resample should be able to do this (in my actual application I'm trying to move from quarterly to monthly, so have observations in Jan and Apr), but that requires some sort of time index, which I can't do as there are duplicates in the Date column.

I'm assuming some sort of groupby magic could help, but can't figure it out!

IanS · Accepted Answer

Edit: replaced resample with reindex for a 2x speed improvement.

df.set_index('Date', inplace=True)
index = ['2016-01', '2016-02', '2016-03']

df.groupby('Group').apply(lambda df1: df1.reindex(index).interpolate())

Using groupby is easy once you understand it just returns one dataframe (here df1) per value in the grouping column.

Pandas resample by groups with duplicate datetimes

Answers (2)

Related Questions