Reputation: 14062
I have a df which looks like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([['1|mm','2|lll'],['A|ljjh','B|ldjdj']])
1|mm 2|lll
A|ljjh B|ldjdj A|ljjh B|ldjdj
0 0.599202 0.093917 0.582809 0.683346
1 0.902717 0.343215 0.222960 0.238709
2 0.808473 0.290253 0.276607 0.775530
3 0.197891 0.505197 0.243890 0.011838
I would like to split the column labels for each level like so:
columnlabel.split("|")[0]
I'm not sure what the best method to do this? should I create a new list and assign that to df.columns or can I do it inplace??
expected output
1 2
A B A B
0 0.599202 0.093917 0.582809 0.683346
1 0.902717 0.343215 0.222960 0.238709
2 0.808473 0.290253 0.276607 0.775530
3 0.197891 0.505197 0.243890 0.011838
Upvotes: 1
Views: 573
Reputation: 862501
You can use get_level_values
with split
for parsing, create new list of tuples
and last new MultiIndex
from_tuples
:
new_names = list(zip(df.columns.get_level_values(0).str.split('|').str[0],
df.columns.get_level_values(1).str.split('|').str[0]))
print (new_names)
[('1', 'A'), ('1', 'B'), ('2', 'A'), ('2', 'B')]
df.columns = pd.MultiIndex.from_tuples(new_names)
print (df)
1 2
A B A B
0 0.400125 0.007743 0.423123 0.662878
1 0.787079 0.314668 0.798404 0.702267
2 0.451037 0.333846 0.030534 0.823515
3 0.135365 0.785421 0.777839 0.248622
Upvotes: 1