Reputation: 61

Comparing two dataframe columns to see if they have the same values

I have two dataframe columns that looks something like this:

   col1    col2 
0   A2      B8
1   B8      C3
2   D2      A2

I want to identify all values in col1 which are not in col2 and visa versa. In the above example, the output would be:

C3, D2

Both lists are not unique.

Upvotes: 0

Answers (2)

asongtoruin

Reputation: 10369

Why not convert each column to a set and calculate the symmetric difference?

import pandas as pd

df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
                   'col2': ['B8', 'C3', 'A2']})

print set(df['col1']).symmetric_difference(set(df['col2']))

This prints:

set(['C3', 'D2'])

EDIT:

If you want to track which elements came from where, you could adapt this and create a new dictionary, like so:

col1 = set(df['col1'])
col2 = set(df['col2'])

exclusive_items = {'col1': [x for x in col1 if x not in col2],
                   'col2': [x for x in col2 if x not in col1]}

print exclusive_items

Where each key in exclusive_items contains as its values the entries unique to that column. This prints:

{'col2': ['C3'], 'col1': ['D2']}

In fact, as we have sets we could simplify this to:

exclusive_items = {'col1': col1-col2,
                   'col2': col2-col1}

Upvotes: 1

paolof89

Reputation: 1369

import pandas as pd

df = pd.DataFrame({'col1': ['A2', 'B8', 'D2'],
                   'col2': ['B8', 'C3', 'A2']})


in_col1_not_in_col2 = list(set(df['col1'])-(set(df['col2'])))
in_col2_not_in_col1 = list(set(df['col2'])-(set(df['col1'])))

print('in_col1_not_in_col2: ')
print(in_col1_not_in_col2)
print('in_col2_not_in_col1: ')
print(in_col2_not_in_col1)

in_col1_not_in_col2: ['D2']

in_col2_not_in_col1: ['C3']

Upvotes: 0

Comparing two dataframe columns to see if they have the same values

Answers (2)

Related Questions