Reputation: 281
I've got a DataFrame like this:
import pandas as pd
df = pd.DataFrame.from_dict({'var1': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var2': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var3': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0},
'var4': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
10: 0.0}})
And I'd like to fill the missing indices, so I used .reindex
first:
df.reindex(np.arange(1, 11))
And I got:
var1 var2 var3 var4
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
9 NaN NaN NaN NaN
10 0.0 0.0 0.0 0.0
However, I need to keep track of multiple indices and when I tried to construct MultiIndex and pass it to .reindex
it didn't work as I was expecting it to:
df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]))
var1 var2 var3 var4
A 1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN
I can't really understand what's going on here and the documentation of .reindex
is not quite clear to me. Can someone advise me on this and tell why MultiIndex can't be passed to .reindex
or what am I doing wrong?
@jazrael provided a good solution when we move from 1-level to 2-level MultiIndex. However, what about a case when we want to reindex from 2-level MultiIndex to 3-level MultiIndex?
E.g.:
df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index])
var1 var2 var3 var4
1 0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
2 4 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
10 0.0 0.0 0.0 0.0
And I'd like to get:
var1 var2 var3 var4
A 1 0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
2 4 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
9 NaN NaN NaN NaN
10 0.0 0.0 0.0 0.0
Upvotes: 7
Views: 878
Reputation: 4263
You can create a new index with the extra level and perform an explicit DataFrame join to get what you want.
df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index], names=["key1", "key2"])
# If df's index is already created, do df.rename_axis(["key1", "key2"], inplace=True)
new_index = pd.MultiIndex.from_arrays([['A']*11, np.repeat([1, 2], [4, 7]), range(11)],
names=["new_key", *df.index.names])
output = pd.DataFrame([], index=new_index).join(df, on=df.index.names) # Join on overlapped index levels based on names
Output:
var1 var2 var3 var4
new_key key1 key2
A 1 0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
2 4 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
9 NaN NaN NaN NaN
10 0.0 0.0 0.0 0.0
Upvotes: 1
Reputation: 862431
Because want use reindex
for simple, not MultiIndex
index is necessary set level=1
for match second level of new MultiIndex
:
df = df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]), level=1)
print (df)
var1 var2 var3 var4
A 1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
9 NaN NaN NaN NaN
10 0.0 0.0 0.0 0.0
Upvotes: 3