Tristan Tran
Tristan Tran

Reputation: 1513

Merging multiple data frames causing duplicate column names

This thread here suggested to use reduce to merge multiple data frames at once.

df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
df3= pd.DataFrame({'key': ['A', 'C', 'E', 'F'], 'value': np.random.randn(4)})
df4= pd.DataFrame({'key': ['A', 'B', 'C', 'F'], 'value': np.random.randn(4)})

df_list = [df1, df2, df3, df4]

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key'], how='outer'), df_list )

The merge is executed, but there is a warning

<stdin>:1: FutureWarning: Passing 'suffixes' which cause duplicate columns {'value_x'} in the result is deprecated and will raise a MergeError in a future version.

and df_meged has columns with exactly duplicated names. How do I force distinct names for each column?

Upvotes: 1

Views: 3879

Answers (2)

BENY
BENY

Reputation: 323306

You can do

s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]: 
  key   value_0   value_1   value_2   value_3
0   A -1.957968       NaN -0.852135 -0.976960
1   B  1.545932 -0.276838       NaN  0.197615
2   C -2.149727       NaN -0.364382  0.349993
3   D  0.524990 -0.476655       NaN       NaN
4   E       NaN -2.135870  0.798782       NaN
5   F       NaN  1.456544 -0.255705  0.447279

Upvotes: 2

wwnde
wwnde

Reputation: 26676

An outer join in this scenario is tantamount to appending on the zero axis. You therefore are better of using the two columns in the dfs as the join keys. code below

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key', 'value'], how='outer'), df_list )

Upvotes: 1

Related Questions