Change duplicated column names after merging multiple tables in python

Question

I have merged 4 files into one.

df1:
ID   name    location   case     pass
1    John      NY       tax       Y
2    Jack      NJ       payment   N
3    John      CA       remote    Y
4    Rose      MA       income    Y
df2:
ID   name    location   case   pass
1    John      NY       car     N
2    Jack      NJ       train   Y
3    John      CA       car     Y
4    Rose      MA       bike    N
df3:
ID   name    location   case     pass
1    John      NY       spring    Y
2    Jack      NJ       spring    Y
3    John      CA       fall      Y
4    Rose      MA       winter    N
df4:
ID   name    location   case    pass
1    John      NY       red      N
2    Jack      NJ       green    N
3    John      CA       yellow   Y
4    Rose      MA       yellow   Y

Here is how I merged those tables.

dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs)

But the result is a bit hard to read. I need to convert those case_x,case_y,pass_x,pass_y to a specific column name. Can I do this when merging the tables?

 ID   name    location     case_x  pass_x  case_y      pass_y   case_x      pass_x  case_y   pass_y
    1    John      NY       tax       Y      car       N        spring      Y       red      N
    2    Jack      NJ       payment   N      train     Y        spring      Y      green     N
    3    John      CA       remote    Y      car       Y        fall        Y      yellow    Y 
    4    Rose      MA       income    Y      bike      N        winter      N      yellow    Y

Here is my expected output,

ID   name    location  case_money  pass_money  case_trans   pass_trans   case_season      pass_season  case_color  pass_color
1    John      NY       tax       Y           car                N        spring                 Y       red      N 
2    Jack      NJ       payment   N           train              Y        spring                 Y      green     N
3    John      CA       remote    Y           car                Y        fall                   Y      yellow    Y 
4    Rose      MA       income    Y           bike               N        winter                 N      yellow    Y

Andy L. · Accepted Answer

Using reduce is still possible with suffixes option and list pop

suff = ['_trans', '_season', '_color']
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)

Out[1944]:
   ID  name location     case pass case_trans pass_trans case_season  \
0  1   John  NY       tax      Y    car        N          spring
1  2   Jack  NJ       payment  N    train      Y          spring
2  3   John  CA       remote   Y    car        Y          fall
3  4   Rose  MA       income   Y    bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

Note: just be careful with the list suff. You need to re-initiate it before re-running the code.

If you want the first case, pass rename to _money, just chain additional rename

df_final = (reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)
                 .rename({'case': 'case_money', 'pass': 'pass_money'}, axis=1))

Out[1951]:
   ID  name location case_money pass_money case_trans pass_trans case_season  \
0  1   John  NY       tax        Y          car        N          spring
1  2   Jack  NJ       payment    N          train      Y          spring
2  3   John  CA       remote     Y          car        Y          fall
3  4   Rose  MA       income     Y          bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

Doing this way, you only need to rename the first set case, pass, all other sets of case, pass already got named by suffixes through merge

Change duplicated column names after merging multiple tables in python

Answers (2)

Related Questions