JCm
JCm

Reputation: 2009

How to get the difference or delta of 2 pandas dataframes

How to get the difference or delta of 2 pandas dataframes having different column names but contains similar type of data.

Ex below:

df1 = 
     'Site ID'     'Band'
0     101           850
1     101           900
2     102           850
3     A01           850
4     A01           900
5     X12           900
6     B08           850

df2 =
     'SITENO'      'FREQ'
0     101           850
1     101           900
2     102           850
3     A01           850
4     A01           900
5     B08           900
6     Z99           850

Now i want to have an output similar to this

delta_df1 = 
     'Site ID'     'Band'
0     B08           850
1     X12           900

delta_df2 =
     'SITENO'      'FREQ'
0     B08           900
1     Z99           850

sorry i have to edit the posted question, i missed something above. There should aslo a site with same site id/siteno but with different band. delta should yield an output base on those pair.

Upvotes: 0

Views: 374

Answers (1)

unutbu
unutbu

Reputation: 880797

Given the two columns:

col1 = df1['Site ID']
col2 = df2['SITENO']

You could find the set difference with

In [107]: set(col1).difference(col2)
Out[107]: {'X12'}

which you could use to build a boolean selection mask:

In [108]: col1.isin(set(col1).difference(col2))
Out[108]: 
0    False
1    False
2    False
3    False
4    False
5     True
Name: Site ID, dtype: bool

and select rows of df1 using df1.loc:

In [109]: df1.loc[col1.isin(set(col1).difference(col2))] 
Out[109]: 
  Site ID  Band
5     X12   900

import pandas as pd

df1 = pd.read_table('data', sep='\s{2,}')
df2 = pd.read_table('data2', sep='\s+')

col1 = df1['Site ID']
col2 = df2['SITENO']

delta_df1 = df1.loc[col1.isin(set(col1).difference(col2))] 
print(delta_df1)

delta_df2 = df2.loc[col2.isin(set(col2).difference(col1))] 
print(delta_df2)

yields

  Site ID  Band
5     X12   900

  SITENO  FREQ
5    B08   850
6    Z99   850

Upvotes: 1

Related Questions