Eyal S.
Eyal S.

Reputation: 1161

Join dataframes - one with multiindex columns and the other without

I'm trying to join two dataframes - one with multiindex columns and the other with a single column name. They have similar index.

I get the following warning:

"UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)"

For example:

import pandas as pd
import numpy as np

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

np.random.seed(2022)  # so the data is the same each time
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df2 = pd.DataFrame(np.random.randn(3), index=['A', 'B', 'C'],columns=['w'])

df3 = df.join(df2)

DataFrame Views

df

first        bar                 baz                 foo                 qux          
second       one       two       one       two       one       two       one       two
A      -0.000528 -0.274901 -0.139286  1.984686  0.282109  0.760809  0.300982  0.540297
B       0.373497  0.377813 -0.090213 -2.305943  1.142760 -1.535654 -0.863752  1.016545
C       1.033964 -0.824492  0.018905 -0.383344 -0.304185  0.997292 -0.127274 -1.475886

df2

          w
A -1.940906
B  0.833649
C -0.567218

df3 - Result

   (bar, one)  (bar, two)  (baz, one)  (baz, two)  (foo, one)  (foo, two)  (qux, one)  (qux, two)         w
A   -0.000528   -0.274901   -0.139286    1.984686    0.282109    0.760809    0.300982    0.540297 -1.940906
B    0.373497    0.377813   -0.090213   -2.305943    1.142760   -1.535654   -0.863752    1.016545  0.833649
C    1.033964   -0.824492    0.018905   -0.383344   -0.304185    0.997292   -0.127274   -1.475886 -0.567218

What is the best way to join these two dataframes?

Upvotes: 20

Views: 22336

Answers (2)

jezrael
jezrael

Reputation: 863226

I think the simplest way is to convert df2 to MultiIndex, and then use concat or join:

df2.columns = pd.MultiIndex.from_tuples([('a','w')])
print (df2)
          a
          w
A -1.940906
B  0.833649
C -0.567218

Or:

df2.columns = [['a'], df2.columns]
print (df2)
          a
          w
A -1.940906
B  0.833649
C -0.567218

df3 = pd.concat([df, df2], axis=1)

Or:

df3 = df.join(df2)

Result:

print (df3)
first        bar                 baz                 foo                 qux                   a
second       one       two       one       two       one       two       one       two         w
A      -0.000528 -0.274901 -0.139286  1.984686  0.282109  0.760809  0.300982  0.540297 -1.940906
B       0.373497  0.377813 -0.090213 -2.305943  1.142760 -1.535654 -0.863752  1.016545  0.833649
C       1.033964 -0.824492  0.018905 -0.383344 -0.304185  0.997292 -0.127274 -1.475886 -0.567218

Additional Resources

Upvotes: 3

piRSquared
piRSquared

Reputation: 294488

It depends on what you want! Do you want the column from df2 to be aligned with the 1st or second level of columns from df?

You have to add a level to the columns of df2

Super cheezy with pd.concat

df.join(pd.concat([df2], axis=1, keys=['a']))

Better way

df2.columns = pd.MultiIndex.from_product([['a'], df2.columns])

df.join(df2)

enter image description here

Upvotes: 14

Related Questions