Pandas Multiindex reindex on levels

Question

I have multiple different series data saved as Multiindex(2-level) pandas dataframe. I want to know how to reindex a Multiindex dataframe so that I get indexes for all(hourly) data between two existing indexes.

So this is an example of my dataframe:

                                   A     B     C     D
tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0                                        
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

And this is what I want to get:

tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

The important thing to remember is that the 'act' index level doesn't have same date range(for example in 2019-01-10 it starts with 2019-01-09 20:00:00 and ends with 2019-01-10 02:00:00 while for 2019-01-16 it starts with 2019-01-09 22:00:00 and ends with 2019-01-10 03:00:00).

I am mainly interested if there exists a solution using pandas methods without unnecessary external loops.

Dawei · Accepted Answer

At first reset_index of your data.

d = df.reset_index()

d

         tick                 act     A     B     C     D
0  2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
1  2019-01-10 2019-01-10 00:00:00  52.0  34.0   1.0   9.0
2  2019-01-10 2019-01-10 01:00:00  75.0  52.0  61.0   1.0
3  2019-01-10 2019-01-10 02:00:00  28.0  29.0  46.0  61.0
4  2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
5  2019-01-16 2019-01-10 02:00:00   2.0  22.0  41.0  59.0
6  2019-01-16 2019-01-10 03:00:00  16.0   9.0  92.0  53.0

Group your data by tick and apply the interpolate function to each group.

def interpolate(df):
    # generate new index
    new_index = pd.date_range(df.act.min(),df.act.max(),freq="h")
    # set `act` as index and unsampleing it to hours
    return df.set_index("act").reindex(new_index) 

d.groupby("tick").apply(interpolate)

It gives:

                                      tick     A     B     C     D
tick                                                              
2019-01-10 2019-01-09 20:00:00  2019-01-10   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  2019-01-10  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  2019-01-10  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  2019-01-10  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  2019-01-16  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00  2019-01-16   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  2019-01-16  16.0   9.0  92.0  53.0

Pandas Multiindex reindex on levels

Answers (1)

Related Questions