JD2775
JD2775

Reputation: 3801

Comparing 2 columns in Pandas DataFrame and populating a 3rd column

I have a DF that has 2 int columns inside, 'CNT' and 'STG_TABLE_CNT'. I want to add a new column 'IS_MATCH' that returns 'Y' if 'CNT' and 'STG_TABLE_CNT' have the same value, or 'N' if they do not.

I tried this:

if result['CNT'] == result['STG_TABLE_CNT']:
    result['IS_MATCH'] = 'Y'
else:
    result['IS_MATCH'] = 'N'

but that is throwing the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I realize it is searching for equality (True/False) I am just not sure how to get around that to return 'Y' or 'N' instead

Upvotes: 0

Views: 251

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

result['CNT'] == result['STG_TABLE_CNT'] gives you a whole series, and pandas is complaining that it does not know how to convert that series into True or False. What you really want is something like

result['IS_MATCH'] = (result['CNT'] == result['STG_TABLE_CNT']).\
                         apply(lambda x: 'Y' if x else 'N')

or

result['IS_MATCH'] = (result['CNT'] == result['STG_TABLE_CNT']).\
                         map({True: 'Y', False: 'N'})

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

Use numpy.where, it's faster than apply or itterating:

import numpy as np

df['IS_MATCH'] = np.where(result['CNT'].eq(result['STG_TABLE_CNT']), 'Y', 'N')

Upvotes: 1

Related Questions