Reputation: 61
I have two dataframe columns that looks something like this:
col1 col2
0 A2 B8
1 B8 C3
2 D2 A2
I want to identify all values in col1 which are not in col2 and visa versa. In the above example, the output would be:
C3, D2
Both lists are not unique.
Upvotes: 0
Views: 1034
Reputation: 10359
Why not convert each column to a set
and calculate the symmetric difference?
import pandas as pd
df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
'col2': ['B8', 'C3', 'A2']})
print set(df['col1']).symmetric_difference(set(df['col2']))
This prints:
set(['C3', 'D2'])
EDIT:
If you want to track which elements came from where, you could adapt this and create a new dictionary, like so:
col1 = set(df['col1'])
col2 = set(df['col2'])
exclusive_items = {'col1': [x for x in col1 if x not in col2],
'col2': [x for x in col2 if x not in col1]}
print exclusive_items
Where each key in exclusive_items
contains as its values the entries unique to that column. This prints:
{'col2': ['C3'], 'col1': ['D2']}
In fact, as we have sets we could simplify this to:
exclusive_items = {'col1': col1-col2,
'col2': col2-col1}
Upvotes: 1
Reputation: 1369
import pandas as pd
df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
'col2': ['B8', 'C3', 'A2']})
in_col1_not_in_col2 = list(set(df['col1'])-(set(df['col2'])))
in_col2_not_in_col1 = list(set(df['col2'])-(set(df['col1'])))
print('in_col1_not_in_col2: ')
print(in_col1_not_in_col2)
print('in_col2_not_in_col1: ')
print(in_col2_not_in_col1)
in_col1_not_in_col2: ['D2']
in_col2_not_in_col1: ['C3']
Upvotes: 0