Reputation: 4928
Using pandas 1.1.3
I have one multi-index dataframe (not sure if this is the correct term) that looks as follows.
df1:
A B
mean max mean max
name
John 3 4 5 6
Cindy 1 6 7 8
Chris 3 5 5 5
df1.columns output MultiIndex([( 'A', 'mean'), ('A', 'max'), ( 'B', 'mean'), ('B', 'max')]).
I have df2 that looks like this: df2:
name title
John Data Engineer
Cindy Data Analyst
Chris Data Scientist
Left joining df1 to df2 works successfully however it breaks multi-index formatting.
pd.merge(df1, df2, how="left", left_index=True, right_on="name")
outputs
(A, mean) (A, max) (B, mean) (B, max) title
name
John 3 4 5 6 Data Engineer
Cindy 1 6 7 8 Data Analyst
Chris 3 5 5 5 Data Scientist
I want to perform join while keeping multi-index format, desired dataframe should look like this:
A B title
mean max mean max
name
John 3 4 5 6 Data Engineer
Cindy 1 6 7 8 Data Analyst
Chris 3 5 5 5 Data Scientist
Thanks in advance!
Upvotes: 2
Views: 262
Reputation: 150785
Am I calling this in correct terminology? Is it supposed to be called Multi-level column?
Yes, it is MultiIndex. What you se is just the string representation of the index. On higher version of Pandas, df.columns
would give you:
MultiIndex([('A', 'mean'),
('A', 'max'),
('B', 'mean'),
('B', 'max')],
)
The problem here is how you view your expected output, specifically the last column? Is it still MultiIndex? If so, what's the second level value?
That said, you can match the expected output with:
df.join(df2.set_index('name').rename(columns=lambda x: (x,"")))
Output:
A B title
mean max mean max
name
John 3 4 5 6 Data Engineer
Cindy 1 6 7 8 Data Analyst
Chris 3 5 5 5 Data Scientist
Upvotes: 2