Nils Gudat
Nils Gudat

Reputation: 13800

Pandas resample by groups with duplicate datetimes

Lots of similar questions on here, but I couldn't find any that actually had observations with the same datetime. A minimum non-working example would be:

df = pd.DataFrame(
    {"Date": np.tile([pd.Series(["2016-01", "2016-03"])], 2)[0],
     "Group": [1,1,2,2],
     "Obs":[1,2,5,6]})

Now I'd like to linearly interpolate the value for February 2016 by group, so the required output is

    Date    Group   Obs
    2016-01     1       1
    2016-02     1     1.5
    2016-03     1       2
    2016-01     2       5
    2016-02     2     5.5
    2016-03     2       6

My understanding is that resample should be able to do this (in my actual application I'm trying to move from quarterly to monthly, so have observations in Jan and Apr), but that requires some sort of time index, which I can't do as there are duplicates in the Date column.

I'm assuming some sort of groupby magic could help, but can't figure it out!

Upvotes: 4

Views: 5615

Answers (2)

IanS
IanS

Reputation: 16251

Edit: replaced resample with reindex for a 2x speed improvement.

df.set_index('Date', inplace=True)
index = ['2016-01', '2016-02', '2016-03']

df.groupby('Group').apply(lambda df1: df1.reindex(index).interpolate())

Using groupby is easy once you understand it just returns one dataframe (here df1) per value in the grouping column.

Upvotes: 2

jezrael
jezrael

Reputation: 862731

You can use:

#convert column Date to datetime
df['Date'] = pd.to_datetime(df.Date)
print (df)
        Date  Group  Obs
0 2016-01-01      1    1
1 2016-03-01      1    2
2 2016-01-01      2    5
3 2016-03-01      2    6

#groupby, resample and interpolate
df1 = df.groupby('Group').apply(lambda x : x.set_index('Date')
                                            .resample('M')
                                            .first()
                                            .interpolate())
                        .reset_index(level=0, drop=True).reset_index()

#convert Date to period
df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
     Date  Group  Obs
0 2016-01    1.0  1.0
1 2016-02    1.0  1.5
2 2016-03    1.0  2.0
3 2016-01    2.0  5.0
4 2016-02    2.0  5.5
5 2016-03    2.0  6.0

EDIT:

Pandas API was changed (0.18.1), so now you can use:

df['Date'] = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)

df1 = df.groupby('Group').apply(lambda df1: df1.resample('M')
                                               .first()
                                               .interpolate())
                         .reset_index(level=0, drop=True).reset_index()

df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
     Date  Group  Obs
0 2016-01    1.0  1.0
1 2016-02    1.0  1.5
2 2016-03    1.0  2.0
3 2016-01    2.0  5.0
4 2016-02    2.0  5.5
5 2016-03    2.0  6.0

Upvotes: 4

Related Questions