Reputation: 719
I have merged 4 files into one.
df1:
ID name location case pass
1 John NY tax Y
2 Jack NJ payment N
3 John CA remote Y
4 Rose MA income Y
df2:
ID name location case pass
1 John NY car N
2 Jack NJ train Y
3 John CA car Y
4 Rose MA bike N
df3:
ID name location case pass
1 John NY spring Y
2 Jack NJ spring Y
3 John CA fall Y
4 Rose MA winter N
df4:
ID name location case pass
1 John NY red N
2 Jack NJ green N
3 John CA yellow Y
4 Rose MA yellow Y
Here is how I merged those tables.
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs)
But the result is a bit hard to read. I need to convert those case_x,case_y,pass_x,pass_y
to a specific column name. Can I do this when merging the tables?
ID name location case_x pass_x case_y pass_y case_x pass_x case_y pass_y
1 John NY tax Y car N spring Y red N
2 Jack NJ payment N train Y spring Y green N
3 John CA remote Y car Y fall Y yellow Y
4 Rose MA income Y bike N winter N yellow Y
Here is my expected output,
ID name location case_money pass_money case_trans pass_trans case_season pass_season case_color pass_color
1 John NY tax Y car N spring Y red N
2 Jack NJ payment N train Y spring Y green N
3 John CA remote Y car Y fall Y yellow Y
4 Rose MA income Y bike N winter N yellow Y
Upvotes: 2
Views: 516
Reputation: 25269
Using reduce
is still possible with suffixes
option and list pop
suff = ['_trans', '_season', '_color']
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'],
suffixes=('', suff.pop(0))), dfs)
Out[1944]:
ID name location case pass case_trans pass_trans case_season \
0 1 John NY tax Y car N spring
1 2 Jack NJ payment N train Y spring
2 3 John CA remote Y car Y fall
3 4 Rose MA income Y bike N winter
pass_season case_color pass_color
0 Y red N
1 Y green N
2 Y yellow Y
3 N yellow Y
Note: just be careful with the list suff
. You need to re-initiate it before re-running the code.
If you want the first case
, pass
rename to _money
, just chain additional rename
df_final = (reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'],
suffixes=('', suff.pop(0))), dfs)
.rename({'case': 'case_money', 'pass': 'pass_money'}, axis=1))
Out[1951]:
ID name location case_money pass_money case_trans pass_trans case_season \
0 1 John NY tax Y car N spring
1 2 Jack NJ payment N train Y spring
2 3 John CA remote Y car Y fall
3 4 Rose MA income Y bike N winter
pass_season case_color pass_color
0 Y red N
1 Y green N
2 Y yellow Y
3 N yellow Y
Doing this way, you only need to rename the first set case, pass
, all other sets of case, pass
already got named by suffixes
through merge
Upvotes: 2
Reputation: 150785
My approach with concat
and pivot_table
:
names = ['money', 'trans', 'season', 'color']
dfs = [df1,df2,df3,df4]
new_df = (pd.concat(d.assign(name=n) for n,d in zip(names, dfs))
.pivot_table(index=['ID','location', 'location'],
columns='name',
values=['case','pass'],
aggfunc='first')
)
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]
Upvotes: 3