Reputation: 9752
if I have the following csv file test.csv:
C01,45,A,R
C02,123,H,I
where I have define sets R
and I
as
R=set(['R','E','D','N','P','H','K'])
I=set(['I','H','G','F','A','C','L','M','P','Q','S','T','V','W','Y'])
I want to be able to test if the string A
is a member of set R
(which is false) and if string H
is a member of set I
(which is true). I have tried to do this with the following script:
#!/usr/bin/env python
import pandas as pd
I=set(['I','H','G','F','A','C','L','M','P','Q','S','T','V','W','Y'])
R=set(['R','E','D','N','P','H','K'])
with open(test.csv) as f:
table = pd.read_table(f, sep=',', header=None, lineterminator='\n')
table[table.columns[3]].astype(str).isin(table[table.columns[4]].astype(str))
i.e. I am trying to do the equivalent of A in R
or rather table.columns[3] in table.columns[4]
and return TRUE or FALSE for each row of data.
The only problem is that using the final line the two rows return TRUE. If I change the final line to
table[table.columns[3]].astype(str).isin(R)
Then I get
0 FALSE
1 TRUE
which is correct. It seems that I am not referencing the set name correctly when doing .isin(table[table.columns[3]].astype(str))
any ideas?
Upvotes: 0
Views: 749
Reputation: 95948
Starting with the following:
In [21]: df
Out[21]:
0 1 2 3
0 C01 45 A R
1 C02 123 H I
In [22]: R=set(['R','E','D','N','P','H','K'])
...: I=set(['I','H','G','F','A','C','L','M','P','Q','S','T','V','W','Y'])
...:
You could do something like this:
In [23]: sets = {"R":R,"I":I}
In [24]: df.apply(lambda S: S[2] in sets[S[3]],axis=1)
Out[24]:
0 False
1 True
dtype: bool
Fair warning, .apply
is slow and doesn't scale with larger data very well. It is there for convenience and a last resort.
Upvotes: 0