Reputation: 13800
Lots of similar questions on here, but I couldn't find any that actually had observations with the same datetime. A minimum non-working example would be:
df = pd.DataFrame(
{"Date": np.tile([pd.Series(["2016-01", "2016-03"])], 2)[0],
"Group": [1,1,2,2],
"Obs":[1,2,5,6]})
Now I'd like to linearly interpolate the value for February 2016 by group, so the required output is
Date Group Obs
2016-01 1 1
2016-02 1 1.5
2016-03 1 2
2016-01 2 5
2016-02 2 5.5
2016-03 2 6
My understanding is that resample
should be able to do this (in my actual application I'm trying to move from quarterly to monthly, so have observations in Jan and Apr), but that requires some sort of time index, which I can't do as there are duplicates in the Date
column.
I'm assuming some sort of groupby
magic could help, but can't figure it out!
Upvotes: 4
Views: 5615
Reputation: 16251
Edit: replaced resample
with reindex
for a 2x speed improvement.
df.set_index('Date', inplace=True)
index = ['2016-01', '2016-02', '2016-03']
df.groupby('Group').apply(lambda df1: df1.reindex(index).interpolate())
Using groupby
is easy once you understand it just returns one dataframe (here df1
) per value in the grouping column.
Upvotes: 2
Reputation: 862731
You can use:
#convert column Date to datetime
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date Group Obs
0 2016-01-01 1 1
1 2016-03-01 1 2
2 2016-01-01 2 5
3 2016-03-01 2 6
#groupby, resample and interpolate
df1 = df.groupby('Group').apply(lambda x : x.set_index('Date')
.resample('M')
.first()
.interpolate())
.reset_index(level=0, drop=True).reset_index()
#convert Date to period
df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
Date Group Obs
0 2016-01 1.0 1.0
1 2016-02 1.0 1.5
2 2016-03 1.0 2.0
3 2016-01 2.0 5.0
4 2016-02 2.0 5.5
5 2016-03 2.0 6.0
EDIT:
Pandas API was changed (0.18.1), so now you can use:
df['Date'] = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)
df1 = df.groupby('Group').apply(lambda df1: df1.resample('M')
.first()
.interpolate())
.reset_index(level=0, drop=True).reset_index()
df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
Date Group Obs
0 2016-01 1.0 1.0
1 2016-02 1.0 1.5
2 2016-03 1.0 2.0
3 2016-01 2.0 5.0
4 2016-02 2.0 5.5
5 2016-03 2.0 6.0
Upvotes: 4