Jiayu Zhang
Jiayu Zhang

Reputation: 719

Change duplicated column names after merging multiple tables in python

I have merged 4 files into one.

df1:
ID   name    location   case     pass
1    John      NY       tax       Y
2    Jack      NJ       payment   N
3    John      CA       remote    Y
4    Rose      MA       income    Y
df2:
ID   name    location   case   pass
1    John      NY       car     N
2    Jack      NJ       train   Y
3    John      CA       car     Y
4    Rose      MA       bike    N
df3:
ID   name    location   case     pass
1    John      NY       spring    Y
2    Jack      NJ       spring    Y
3    John      CA       fall      Y
4    Rose      MA       winter    N
df4:
ID   name    location   case    pass
1    John      NY       red      N
2    Jack      NJ       green    N
3    John      CA       yellow   Y
4    Rose      MA       yellow   Y

Here is how I merged those tables.

dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs)

But the result is a bit hard to read. I need to convert those case_x,case_y,pass_x,pass_y to a specific column name. Can I do this when merging the tables?

 ID   name    location     case_x  pass_x  case_y      pass_y   case_x      pass_x  case_y   pass_y
    1    John      NY       tax       Y      car       N        spring      Y       red      N
    2    Jack      NJ       payment   N      train     Y        spring      Y      green     N
    3    John      CA       remote    Y      car       Y        fall        Y      yellow    Y 
    4    Rose      MA       income    Y      bike      N        winter      N      yellow    Y  

Here is my expected output,

ID   name    location  case_money  pass_money  case_trans   pass_trans   case_season      pass_season  case_color  pass_color
1    John      NY       tax       Y           car                N        spring                 Y       red      N 
2    Jack      NJ       payment   N           train              Y        spring                 Y      green     N
3    John      CA       remote    Y           car                Y        fall                   Y      yellow    Y 
4    Rose      MA       income    Y           bike               N        winter                 N      yellow    Y  

Upvotes: 2

Views: 516

Answers (2)

Andy L.
Andy L.

Reputation: 25269

Using reduce is still possible with suffixes option and list pop

suff = ['_trans', '_season', '_color']
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)

Out[1944]:
   ID  name location     case pass case_trans pass_trans case_season  \
0  1   John  NY       tax      Y    car        N          spring
1  2   Jack  NJ       payment  N    train      Y          spring
2  3   John  CA       remote   Y    car        Y          fall
3  4   Rose  MA       income   Y    bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

Note: just be careful with the list suff. You need to re-initiate it before re-running the code.


If you want the first case, pass rename to _money, just chain additional rename

df_final = (reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)
                 .rename({'case': 'case_money', 'pass': 'pass_money'}, axis=1))

Out[1951]:
   ID  name location case_money pass_money case_trans pass_trans case_season  \
0  1   John  NY       tax        Y          car        N          spring
1  2   Jack  NJ       payment    N          train      Y          spring
2  3   John  CA       remote     Y          car        Y          fall
3  4   Rose  MA       income     Y          bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

Doing this way, you only need to rename the first set case, pass, all other sets of case, pass already got named by suffixes through merge

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

My approach with concat and pivot_table:

names = ['money', 'trans', 'season', 'color']
dfs = [df1,df2,df3,df4]

new_df = (pd.concat(d.assign(name=n) for n,d in zip(names, dfs))
            .pivot_table(index=['ID','location', 'location'],
                         columns='name',
                         values=['case','pass'],
                         aggfunc='first')
         )
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]

Upvotes: 3

Related Questions