Reputation: 1559
Is it possible to compare 4 dataframes based on 2 columns and get result containing duplicate if appeared in 2 or more dataframes. The result should contain count of occurance. My dataframes look like
>>>df1
Circle Division Power
0 AAAA AA 25
1 BBBB BB 5
>>>df2
Circle Division Power
0 CCCC CC 25
1 BBBB BB 66
>>>df3
Circle Division Power
0 DDDD DD 55
1 FFFF FF 68
2 AAAA AA 87
>>>df4
Circle Division Power
0 AAAA AA 45
1 CCCC CC 56
Expected result
>>>result_df
Circle Division Power1 power2 power3 power4 Repeated
0 AAAA AA 25 - 87 45 3
1 BBBB BB 5 66 - - 2
2 CCCC CC - 25 - 56 2
I Tried to merge one by one But Stuck after that.
m12=pd.merge(df1, df2, on=['Circle','Division'], how='inner',suffixes=('1',' 2'))
m13=pd.merge(df1, df3, on=['Circle','Division'], how='inner',suffixes=('1',' 3'))
m14=pd.merge(df1, df4, on=['Circle','Division'], how='inner',suffixes=('1',' 4'))
m23=pd.merge(df2, df3, on=['Circle','Division'], how='inner',suffixes=('2',' 3'))
m24=pd.merge(df2, df4, on=['Circle','Division'], how='inner',suffixes=('2',' 4'))
m34=pd.merge(df3, df4, on=['Circle','Division'], how='inner',suffixes=('3',' 4'))
Upvotes: 2
Views: 33
Reputation: 862671
Use concat
with DataFrame.set_index
and parameter keys
for join all DataFrame together, flatten MultiIndex
.
Create new column by DataFrame.count
for get non NaN
s values per rows and filter by boolean indexing
:
dfs = [df1, df2, df3, df4]
comp = [x.set_index(['Circle','Division']) for x in dfs]
df = pd.concat(comp, axis=1, keys=(range(1, len(dfs)+ 1)))
df.columns = [f'{b}{a}' for a, b in df.columns]
df['Repeat'] = df.count(axis=1)
df = df[df['Repeat'] > 1]
df = df.reset_index()
print (df)
Circle Division Power1 Power2 Power3 Power4 Repeat
0 AAAA AA 25.0 NaN 87.0 45.0 3
1 BBBB BB 5.0 66.0 NaN NaN 2
2 CCCC CC NaN 25.0 NaN 56.0 2
Upvotes: 2