Reputation: 8811
This is similar to the problem I asked here. However, I found out that the data I am working is not always consistent. For, example say :
import pandas as pd
df = pd.DataFrame(pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]],columns=["X_a","Y_c","X_b","Y_a"]))
X_a Y_c X_b Y_a
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Now you can see that X
does not have corresponding c
column and Y
does not have corresponding b
column. Now when I want to create the multi-level index, I want the dataframe to look like this:
X Y
a b c a b c
0 1 3 -1 4 -1 2
1 5 7 -1 8 -1 6
2 9 11 -1 12 -1 10
So as you can see, I want the split in such a way that all upper level columns should have the same lower level columns. Since, the dataset is positve, I am thinking of filling the missing columns with -1, although I am open for suggestions on this. The closest thing I found to my problem was this answer. However, I cannot make it to somehow work with MultiLevel Index like in my previous question. Any help is appreciated.
Upvotes: 13
Views: 5090
Reputation: 402852
Create a MultiIndex
and set df.columns
.
idx = df.columns.str.split('_', expand=True)
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
labels=[[0, 1, 0, 1], [0, 2, 1, 0]])
df.columns = idx
Now, with the existing MultiIndex
, create a new index and use that to reindex
the original.
idx = pd.MultiIndex.from_product([idx.levels[0], idx.levels[1]])
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
df.reindex(columns=idx, fill_value=-1)
X Y
a b c a b c
0 1 3 -1 4 -1 2
1 5 7 -1 8 -1 6
2 9 11 -1 12 -1 10
Upvotes: 24