Reputation: 120
I have multiple different series data saved as Multiindex(2-level) pandas dataframe. I want to know how to reindex a Multiindex dataframe so that I get indexes for all(hourly) data between two existing indexes.
So this is an example of my dataframe:
A B C D
tick act
2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
2019-01-10 02:00:00 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 16.0 9.0 92.0 53.0
And this is what I want to get:
tick act
2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
2019-01-09 21:00:00 NaT NaN NaN NaN NaN
2019-01-09 22:00:00 NaT NaN NaN NaN NaN
2019-01-09 23:00:00 NaT NaN NaN NaN NaN
2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
2019-01-09 23:00:00 NaT NaN NaN NaN NaN
2019-01-10 00:00:00 NaT NaN NaN NaN NaN
2019-01-10 01:00:00 NaT NaN NaN NaN NaN
2019-01-10 02:00:00 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 16.0 9.0 92.0 53.0
The important thing to remember is that the 'act' index level doesn't have same date range(for example in 2019-01-10 it starts with 2019-01-09 20:00:00 and ends with 2019-01-10 02:00:00 while for 2019-01-16 it starts with 2019-01-09 22:00:00 and ends with 2019-01-10 03:00:00).
I am mainly interested if there exists a solution using pandas methods without unnecessary external loops.
Upvotes: 1
Views: 912
Reputation: 1066
At first reset_index
of your data.
d = df.reset_index()
d
tick act A B C D
0 2019-01-10 2019-01-09 20:00:00 5.0 5.0 5.0 5.0
1 2019-01-10 2019-01-10 00:00:00 52.0 34.0 1.0 9.0
2 2019-01-10 2019-01-10 01:00:00 75.0 52.0 61.0 1.0
3 2019-01-10 2019-01-10 02:00:00 28.0 29.0 46.0 61.0
4 2019-01-16 2019-01-09 22:00:00 91.0 42.0 3.0 34.0
5 2019-01-16 2019-01-10 02:00:00 2.0 22.0 41.0 59.0
6 2019-01-16 2019-01-10 03:00:00 16.0 9.0 92.0 53.0
Group your data by tick
and apply the interpolate
function to each group.
def interpolate(df):
# generate new index
new_index = pd.date_range(df.act.min(),df.act.max(),freq="h")
# set `act` as index and unsampleing it to hours
return df.set_index("act").reindex(new_index)
d.groupby("tick").apply(interpolate)
It gives:
tick A B C D
tick
2019-01-10 2019-01-09 20:00:00 2019-01-10 5.0 5.0 5.0 5.0
2019-01-09 21:00:00 NaN NaN NaN NaN NaN
2019-01-09 22:00:00 NaN NaN NaN NaN NaN
2019-01-09 23:00:00 NaN NaN NaN NaN NaN
2019-01-10 00:00:00 2019-01-10 52.0 34.0 1.0 9.0
2019-01-10 01:00:00 2019-01-10 75.0 52.0 61.0 1.0
2019-01-10 02:00:00 2019-01-10 28.0 29.0 46.0 61.0
2019-01-16 2019-01-09 22:00:00 2019-01-16 91.0 42.0 3.0 34.0
2019-01-09 23:00:00 NaN NaN NaN NaN NaN
2019-01-10 00:00:00 NaN NaN NaN NaN NaN
2019-01-10 01:00:00 NaN NaN NaN NaN NaN
2019-01-10 02:00:00 2019-01-16 2.0 22.0 41.0 59.0
2019-01-10 03:00:00 2019-01-16 16.0 9.0 92.0 53.0
Upvotes: 2