Pavan Kumar T S
Pavan Kumar T S

Reputation: 1559

How to compare and get count of coulmn value occurance from multiple dataframes

Is it possible to compare 4 dataframes based on 2 columns and get result containing duplicate if appeared in 2 or more dataframes. The result should contain count of occurance. My dataframes look like

>>>df1
  Circle Division Power 
0 AAAA   AA       25   
1 BBBB   BB       5     
>>>df2
  Circle Division Power 
0 CCCC   CC       25   
1 BBBB   BB       66
>>>df3
  Circle Division Power 
0 DDDD   DD       55   
1 FFFF   FF       68
2 AAAA   AA       87    
>>>df4
  Circle Division Power 
0 AAAA   AA       45   
1 CCCC   CC       56   

Expected result

>>>result_df
  Circle Division Power1 power2 power3 power4 Repeated
0 AAAA   AA       25     -      87     45     3
1 BBBB   BB       5      66     -      -      2
2 CCCC   CC       -      25     -      56     2 

I Tried to merge one by one But Stuck after that.

 m12=pd.merge(df1, df2, on=['Circle','Division'], how='inner',suffixes=('1',' 2'))
 m13=pd.merge(df1, df3, on=['Circle','Division'], how='inner',suffixes=('1',' 3'))
 m14=pd.merge(df1, df4, on=['Circle','Division'], how='inner',suffixes=('1',' 4'))
 m23=pd.merge(df2, df3, on=['Circle','Division'], how='inner',suffixes=('2',' 3'))
 m24=pd.merge(df2, df4, on=['Circle','Division'], how='inner',suffixes=('2',' 4'))
 m34=pd.merge(df3, df4, on=['Circle','Division'], how='inner',suffixes=('3',' 4'))

Upvotes: 2

Views: 33

Answers (1)

jezrael
jezrael

Reputation: 862671

Use concat with DataFrame.set_index and parameter keys for join all DataFrame together, flatten MultiIndex.

Create new column by DataFrame.count for get non NaNs values per rows and filter by boolean indexing:

dfs = [df1, df2, df3, df4]

comp = [x.set_index(['Circle','Division']) for x in dfs]
df = pd.concat(comp, axis=1, keys=(range(1, len(dfs)+ 1)))
df.columns = [f'{b}{a}' for a, b in df.columns]
df['Repeat'] = df.count(axis=1)

df = df[df['Repeat'] > 1]
df = df.reset_index()
print (df)
  Circle Division  Power1  Power2  Power3  Power4  Repeat
0   AAAA       AA    25.0     NaN    87.0    45.0       3
1   BBBB       BB     5.0    66.0     NaN     NaN       2
2   CCCC       CC     NaN    25.0     NaN    56.0       2

Upvotes: 2

Related Questions