Reputation: 894
I have a pandas dataframe that looks like:
import pandas as pd
df= pd.DataFrame({'Date':['2016-12-31','2018-12-31','2017-06-30','2019-06-30',],'Business':['FRC','FICC','FRC','FICC']\
,'Amount':[10,100,20,200]})
df
Amount Business Date
0 10 FRC 2016-12-31
1 100 FICC 2018-12-31
2 20 FRC 2017-06-30
3 200 FICC 2019-06-30
I wanted to use the re-index and interpolate functions to get an output such as :
import pandas as pd
df= pd.DataFrame({'Date':['2016-12-31','2018-12-31','2017-06-30','2019-06-30','2017-03-31','2019-03-31'],'Business':['FRC','FICC','FRC','FICC','FRC','FICC']\
,'Amount':[10,100,20,200,15,150]})
df
Amount Business Date
0 10 FRC 2016-12-31
1 100 FICC 2018-12-31
2 20 FRC 2017-06-30
3 200 FICC 2019-06-30
4 15 FRC 2017-03-31
5 150 FICC 2019-03-31
In essence, I need to divide the dataframe into two clusters and use the interpolate function on each cluster separately.So, the FRC cluster will have an interpolation of 15 and the FICC cluster an interpolation of 150
Upvotes: 0
Views: 183
Reputation: 16079
You could group the data first and then use resample
and interpolate
to set the values.
out = pd.DataFrame()
businesses = df.groupby('Business')
for idx, business in businesses:
name = business.Business.unique()[0]
temp = business.set_index('Date').resample('3M').interpolate()
temp.Business = name
out = out.append(temp)
out = out.reset_index(drop=True)
out
Date Amount Business
0 2018-12-31 100.0 FICC
1 2019-03-31 150.0 FICC
2 2019-06-30 200.0 FICC
3 2016-12-31 10.0 FRC
4 2017-03-31 15.0 FRC
5 2017-06-30 20.0 FRC
I interpreted a resample rate of 3 months based on your example, but if your real data is slightly different you could put in some logic to calculate the time interval of interest (days, months, years, ...) and use that as your resampling period.
Upvotes: 1