haneulkim
haneulkim

Reputation: 4928

join multiindex dataframe with single-index dataframe breaks multiindex format

Using pandas 1.1.3

I have one multi-index dataframe (not sure if this is the correct term) that looks as follows.

df1:

            A          B
       mean   max   mean max
name   

John    3      4     5    6
Cindy   1      6     7    8
Chris   3      5     5    5

df1.columns output MultiIndex([( 'A', 'mean'), ('A', 'max'), ( 'B', 'mean'), ('B', 'max')]).

  1. Am I calling this in correct terminology? Is it supposed to be called Multi-level column?

I have df2 that looks like this: df2:

 name      title
John     Data Engineer
Cindy    Data Analyst
Chris    Data Scientist

Left joining df1 to df2 works successfully however it breaks multi-index formatting.

pd.merge(df1, df2, how="left", left_index=True, right_on="name")

outputs

       (A, mean)    (A, max)    (B, mean)     (B, max)     title
name   
John       3           4            5            6         Data Engineer
Cindy      1           6            7            8         Data Analyst
Chris      3           5            5            5         Data Scientist

I want to perform join while keeping multi-index format, desired dataframe should look like this:

                  A                       B                  title
          mean         max          mean       max
 name   
John       3           4            5            6         Data Engineer
Cindy      1           6            7            8         Data Analyst
Chris      3           5            5            5         Data Scientist

Thanks in advance!

Upvotes: 2

Views: 262

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

Am I calling this in correct terminology? Is it supposed to be called Multi-level column?

Yes, it is MultiIndex. What you se is just the string representation of the index. On higher version of Pandas, df.columns would give you:

MultiIndex([('A', 'mean'),
            ('A',  'max'),
            ('B', 'mean'),
            ('B',  'max')],
           )

The problem here is how you view your expected output, specifically the last column? Is it still MultiIndex? If so, what's the second level value?

That said, you can match the expected output with:

df.join(df2.set_index('name').rename(columns=lambda x: (x,"")))

Output:

         A        B               title
      mean max mean max                
name                                   
John     3   4    5   6   Data Engineer
Cindy    1   6    7   8    Data Analyst
Chris    3   5    5   5  Data Scientist

Upvotes: 2

Related Questions