python use test if value of a pandas dataframe is in membership of a set denoted by another column

Question

if I have the following csv file test.csv:

C01,45,A,R
C02,123,H,I

where I have define sets R and I as

R=set(['R','E','D','N','P','H','K'])
I=set(['I','H','G','F','A','C','L','M','P','Q','S','T','V','W','Y'])

I want to be able to test if the string A is a member of set R (which is false) and if string H is a member of set I (which is true). I have tried to do this with the following script:

#!/usr/bin/env python
import pandas as pd

I=set(['I','H','G','F','A','C','L','M','P','Q','S','T','V','W','Y'])
R=set(['R','E','D','N','P','H','K'])

with open(test.csv) as f:
    table = pd.read_table(f, sep=',', header=None, lineterminator='
')
table[table.columns[3]].astype(str).isin(table[table.columns[4]].astype(str))

i.e. I am trying to do the equivalent of A in R or rather table.columns[3] in table.columns[4] and return TRUE or FALSE for each row of data.

The only problem is that using the final line the two rows return TRUE. If I change the final line to

table[table.columns[3]].astype(str).isin(R)

Then I get

0   FALSE
1   TRUE

which is correct. It seems that I am not referencing the set name correctly when doing .isin(table[table.columns[3]].astype(str))

any ideas?

python use test if value of a pandas dataframe is in membership of a set denoted by another column

Answers (1)

Related Questions