Reputation: 2151
I'm having trouble understanding pandas reindex
. I have a series of measurements, munged into a multi-index df
, and I'd like to reindex and interpolate those measurements to align them with some other data.
My actual data has ~7 index levels and several different measurements. I hope the solution for this toy data problem is applicable to my real data. It's "small data"; each individual measurement is a couple KB.
Here's a pair of toy problems, one which shows the expected behavior and one which doesn't seem to do anything.
Single-level index, works as expected:
"""
step,value
1,1
3,2
5,1
"""
df_i = pd.read_clipboard(sep=",").set_index("step")
print(df_i)
new_index = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
df_i = df_i.reindex(new_index).interpolate()
print(df_i)
Outputs, the original df and the re-indexed and interpolated one:
value
step
1 1
3 2
5 1
value
step
1 1.0
2 1.5
3 2.0
4 1.5
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
Works great.
Multi-index, currently not working:
"""
sample,meas_id,step,value
1,1,1,1
1,1,3,2
1,1,5,1
1,2,3,2
1,2,5,2
1,2,7,1
1,2,9,0
"""
df_mi = pd.read_clipboard(sep=",").set_index(["sample", "meas_id", "step"])
print(df_mi)
df_mi = df_mi.reindex(new_index, level="step").interpolate()
print(df_mi)
Output, unchanged after reindex (and therefore after interpolate):
value
sample meas_id step
1 1 1 1
3 2
5 1
2 3 2
5 2
7 1
9 0
value
sample meas_id step
1 1 1 1
3 2
5 1
2 3 2
5 2
7 1
9 0
How do I actually reindex a column in a multi-index df?
Here's the output I'd like, assuming linear interpolation:
value
sample meas_id step
1 1 1 1
2 1.5
3 2
5 1
6 1
7 1
8 1
9 1
2 1 NaN (or 2)
2 NaN (or 2)
3 2
4 2
5 2
6 1.5
7 1
8 0.5
9 0
I spent some sincere time looking over SO, and if the answer is in there, I missed it:
Fill multi-index Pandas DataFrame with interpolation
Resampling Within a Pandas MultiIndex
pandas multiindex dataframe, ND interpolation for missing values
Fill multi-index Pandas DataFrame with interpolation
https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-reindexing
Possibly related GitHub issues:
https://github.com/numpy/numpy/issues/11975
https://github.com/pandas-dev/pandas/issues/23104
https://github.com/pandas-dev/pandas/issues/17132
Upvotes: 0
Views: 1347
Reputation: 323396
IIUC create the index by using MultiIndex.from_product
, then just do reindex
idx=pd.MultiIndex.from_product([df_mi.index.levels[0],df_mi.index.levels[1],new_index])
df_mi.reindex(idx).interpolate()
Out[161]:
value
1 1 1 1.000000
2 1.500000
3 2.000000
4 1.500000
5 1.000000
6 1.142857
7 1.285714
8 1.428571
9 1.571429
2 1 1.714286 # here is bad , it take previous value into consideration
2 1.857143
3 2.000000
4 2.000000
5 2.000000
6 1.500000
7 1.000000
8 0.500000
9 0.000000
My think
def idx(x):
idx = pd.MultiIndex.from_product([x.index.get_level_values(0).unique(), x.index.get_level_values(1).unique(), new_index])
return idx
pd.concat([y.reindex(idx(y)).interpolate() for _,y in df_mi.groupby(level=[0,1])])
value
1 1 1 1.0
2 1.5
3 2.0
4 1.5
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
2 1 NaN
2 NaN
3 2.0
4 2.0
5 2.0
6 1.5
7 1.0
8 0.5
9 0.0
Upvotes: 1