Reputation: 2009
How to get the difference or delta of 2 pandas dataframes having different column names but contains similar type of data.
Ex below:
df1 =
'Site ID' 'Band'
0 101 850
1 101 900
2 102 850
3 A01 850
4 A01 900
5 X12 900
6 B08 850
df2 =
'SITENO' 'FREQ'
0 101 850
1 101 900
2 102 850
3 A01 850
4 A01 900
5 B08 900
6 Z99 850
Now i want to have an output similar to this
delta_df1 =
'Site ID' 'Band'
0 B08 850
1 X12 900
delta_df2 =
'SITENO' 'FREQ'
0 B08 900
1 Z99 850
sorry i have to edit the posted question, i missed something above. There should aslo a site with same site id/siteno but with different band. delta should yield an output base on those pair.
Upvotes: 0
Views: 374
Reputation: 880797
Given the two columns:
col1 = df1['Site ID']
col2 = df2['SITENO']
You could find the set difference with
In [107]: set(col1).difference(col2)
Out[107]: {'X12'}
which you could use to build a boolean selection mask:
In [108]: col1.isin(set(col1).difference(col2))
Out[108]:
0 False
1 False
2 False
3 False
4 False
5 True
Name: Site ID, dtype: bool
and select rows of df1
using df1.loc
:
In [109]: df1.loc[col1.isin(set(col1).difference(col2))]
Out[109]:
Site ID Band
5 X12 900
import pandas as pd
df1 = pd.read_table('data', sep='\s{2,}')
df2 = pd.read_table('data2', sep='\s+')
col1 = df1['Site ID']
col2 = df2['SITENO']
delta_df1 = df1.loc[col1.isin(set(col1).difference(col2))]
print(delta_df1)
delta_df2 = df2.loc[col2.isin(set(col2).difference(col1))]
print(delta_df2)
yields
Site ID Band
5 X12 900
SITENO FREQ
5 B08 850
6 Z99 850
Upvotes: 1