Reputation: 27

Compare column values in pandas

I have a code that contains two columns related to gender for an individual. I would like to see if there are any discrepancies as well as find missing values. I am unfamiliar with how to do this. Any help or resources you can provide would be greatly appreciated as I am still new. If possible I would also like to create a third column that states the outcome.

d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''], 
      'Second': ['Male', 'Male', '', 'Female', '', 'Male']
df = pd.DataFrame(data = d)

#possible output column
df['Third'] = ['match', 'discrepancy', 'missing', 'match', 'discrepancy', 'discrepancy']

Upvotes: 1

Answers (3)

sitting_duck

Reputation: 3730

Solve by using nested numpy np.where():

df['Third'] = np.where(df.First == df.Second, np.where(df.First.str.len() > 0, 'match', 'missing'), 'discrepancy')
print(df)

Output:

    First  Second        Third
0    Male    Male        match
1  Female    Male  discrepancy
2                      missing
3  Female  Female        match
4    Male          discrepancy
5            Male  discrepancy

Upvotes: 1

Ananay Mital

Reputation: 1475

Define a function, apply it to each row and save the result in the "Third" column

def compare(row):
    first = row["First"]
    second = row["Second"]
    if len(first) == 0 and len(second) == 0:
        return "missing"
    if pd.isna(first) and pd.isna(second):
        return "missing"    
    if first != second:
        return "discrepancy"
    return "match"
df["Third"] = df.apply(compare, axis=1)
df

axis=1 applies the compare function row wise. The default is axis=0 which sends each column to the function.

Upvotes: 1

Henry Ecker

Reputation: 35686

You can apply a function to every row using apply on axis=1.

You can adjust this function to return any value you would like based on your conditions.

import pandas as pd

d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''],
     'Second': ['Male', 'Male', '', 'Female', '', 'Male']}
df = pd.DataFrame(data=d)


def eval_row(r):
     if not r['First'] and not r['Second']:
          return 'missing'
     elif r['First'] == r['Second']:
          return 'match'
     else:
          return 'discrepancy'

df['Third'] = df.apply(eval_row, axis=1)
print(df)

Produces:

    First  Second        Third
0    Male    Male        match
1  Female    Male  discrepancy
2                      missing
3  Female  Female        match
4    Male          discrepancy
5            Male  discrepancy

Upvotes: 2

Compare column values in pandas

Answers (3)

Related Questions