Number Logic
Number Logic

Reputation: 894

Reindex and Interpolate on complex Pandas dataframe

I have a pandas dataframe that looks like:

import pandas as pd
df= pd.DataFrame({'Date':['2016-12-31','2018-12-31','2017-06-30','2019-06-30',],'Business':['FRC','FICC','FRC','FICC']\
              ,'Amount':[10,100,20,200]})
df
   Amount Business        Date
0      10      FRC  2016-12-31
1     100     FICC  2018-12-31
2      20      FRC  2017-06-30
3     200     FICC  2019-06-30

I wanted to use the re-index and interpolate functions to get an output such as :

import pandas as pd
df= pd.DataFrame({'Date':['2016-12-31','2018-12-31','2017-06-30','2019-06-30','2017-03-31','2019-03-31'],'Business':['FRC','FICC','FRC','FICC','FRC','FICC']\
              ,'Amount':[10,100,20,200,15,150]})
df
   Amount Business        Date
0      10      FRC  2016-12-31
1     100     FICC  2018-12-31
2      20      FRC  2017-06-30
3     200     FICC  2019-06-30
4      15      FRC  2017-03-31
5     150     FICC  2019-03-31

In essence, I need to divide the dataframe into two clusters and use the interpolate function on each cluster separately.So, the FRC cluster will have an interpolation of 15 and the FICC cluster an interpolation of 150

Upvotes: 0

Views: 183

Answers (1)

Grr
Grr

Reputation: 16079

You could group the data first and then use resample and interpolate to set the values.

out = pd.DataFrame()
businesses = df.groupby('Business')
for idx, business in businesses:
    name = business.Business.unique()[0]
    temp = business.set_index('Date').resample('3M').interpolate()
    temp.Business = name
    out = out.append(temp)
out = out.reset_index(drop=True)
out
        Date  Amount Business
0 2018-12-31   100.0     FICC
1 2019-03-31   150.0     FICC
2 2019-06-30   200.0     FICC
3 2016-12-31    10.0      FRC
4 2017-03-31    15.0      FRC
5 2017-06-30    20.0      FRC

I interpreted a resample rate of 3 months based on your example, but if your real data is slightly different you could put in some logic to calculate the time interval of interest (days, months, years, ...) and use that as your resampling period.

Upvotes: 1

Related Questions