Larry Woo
Larry Woo

Reputation: 13

Compare multiple sets of columns to produce boolean result column

Using Pandas, I have a dataframe that looks like this:

col_a   col_b    col_a1    col_b1
Larry   Larry     Peter     Peter
Lee     Lee      Jeremy    Ilia

I want to compare col_a to col_b, and col_a1 to col_b1. If both pairs match, indicate it in a new column (flag):

col_a   col_b    col_a1    col_b1   flag
Larry   Larry     Peter     Peter   True
Lee     Lee      Jeremy    Ilia     False

How can I do this?

Upvotes: 1

Views: 66

Answers (3)

fixxxer
fixxxer

Reputation: 16134

I find the following code to be much simpler to read through. You just have to compare two columns at a time and and both the results to get the flag column:

In one line:

In [18]: tf['flag'] = (tf['col_a'] == tf['col_b']) & (tf['col_a1'] == tf['col_b1'])

In [19]: tf
Out[19]: 
   col_a  col_b  col_a1 col_b1   flag
0  Larry  Larry   Peter  Peter   True
1    Lee    Lee  Jeremy   Ilia  False

Upvotes: 0

Alvaro Fuentes
Alvaro Fuentes

Reputation: 17455

You can use DataFrame.eval:

import pandas as pd

df = pd.DataFrame({
    "col_a":["Larry","Lee"],
    "col_b":["Larry","Lee"],
    "col_a1":["Peter","Jeremy"],
    "col_b1":["Peter","Ilia"]
    })

print df
df["flag"] = df.eval("col_a==col_b and col_a1==col_b1")    
print df

Output:

   col_a  col_a1  col_b col_b1
0  Larry   Peter  Larry  Peter
1    Lee  Jeremy    Lee   Ilia

   col_a  col_a1  col_b col_b1   flag
0  Larry   Peter  Larry  Peter   True
1    Lee  Jeremy    Lee   Ilia  False

If it happens that the columns to be compared are stored in two lists like a_cols and b_cols you can do something like:

a_cols = ["col_a","col_a1"]
b_cols = ["col_b","col_b1"]
df["flag"] = df.eval(" and ".join("%s==%s" % pair for pair in zip(a_cols,b_cols)))   
print df

Output:

   col_a  col_a1  col_b col_b1   flag
0  Larry   Peter  Larry  Peter   True
1    Lee  Jeremy    Lee   Ilia  False

Upvotes: 0

Gohawks
Gohawks

Reputation: 1134

You can use the apply function:

import pandas as pd

df = pd.DataFrame({'col_a':('A','B'), 'col_b':('A','B'), 'col_a1':('C','D'),'col_b1':('C','E')})

df = df[['col_a','col_b','col_a1','col_b1']]

df['flag'] = df.apply(lambda x: ('True' if x['col_a']== x['col_b'] and x['col_a1']==x['col_b1'] else 'False'),axis=1)

print df

Upvotes: 1

Related Questions