Reputation: 14037
How to rename columns with multiple levels after pandas pivot operation?
Here's some code to generate test data:
import pandas as pd
df = pd.DataFrame({
'c0': ['A','A','B','C'],
'c01': ['A','A1','B','C'],
'c02': ['b','b','d','c'],
'v1': [1, 3,4,5],
'v2': [1, 3,4,5]})
print(df)
gives a test dataframe:
c0 c01 c02 v1 v2
0 A A b 1 1
1 A A1 b 3 3
2 B B d 4 4
3 C C c 5 5
applying pivot
df2 = pd.pivot_table(df, index=["c0"], columns=["c01","c02"], values=["v1","v2"])
df2 = df2.reset_index()
gives
how to rename the columns by joining levels?
with format
<c01 value>_<c02 value>_<v1>
for example first column should look like
"A_b_v1"
The order of joining levels isn't really important to me.
Upvotes: 24
Views: 69459
Reputation: 23071
If some column names are not strings, you can map
the column names to strings and join
them.
df2.columns = ['_'.join(map(str, c)).strip('_') for c in df2]
If you want to chain the renaming method to pivot_table
method to put it in a pipeline, you can do so using pipe
and set_axis
.
Moreover, you can also reorder column levels using reorder_levels
, e.g. <c01 value>_<c02 value>_<v1>
instead of <v1>_<c01 value>_<c02 value>
df2 = (
df.pivot_table(index=["c0"], columns=["c01","c02"], values=['1','2'])
.reorder_levels([1,2,0], axis=1) # makes "v1","v2" the last level
.pipe(lambda x: x.set_axis(
map('_'.join, x) # if all column names are strings
#('_'.join(map(str, c)) for c in x) # if some column names are not strings
, axis=1)
) # rename columns
.reset_index()
)
Upvotes: 5
Reputation: 214927
If you want to coalesce the multi-index into a single string index without caring about the index level order, you can simply map
a join
function over the columns, and assign the result list back:
df2.columns = list(map("_".join, df2.columns))
And for your question, you can loop through the columns where each element is a tuple, unpack the tuple and join them back in the order you want:
df2 = pd.pivot_table(df, index=["c0"], columns=["c01","c02"], values=["v1","v2"])
# Use the list comprehension to make a list of new column names and assign it back
# to the DataFrame columns attribute.
df2.columns = ["_".join((j,k,i)) for i,j,k in df2.columns]
df2.reset_index()
Upvotes: 38