lindak
lindak

Reputation: 167

pandas find perfect match in substring

I have a pandas df:

      0       1       2            3
0  chr1   69090   70008   OR4F5|CDS3
1  chr1  450739  451678  OR4F29|CDS1
2  chr1  925917  926037  SAMD11|CDS2
3  chr1  930154  930336     SAM|CDS2
4  chr1  940555  947899   ERSAM|CDS1
5  chr1  944686  944806   NOC2L|CDS3
6  chr1  945041  945161   NOC2L|CDS3

and a list:

genes = ["OR4F5", "SAM"]

How can I extract the rows with an exact match in the list?

out = pd.DataFrame()
for gene in genes:
    out = pd.concat([out, df[df[3].str.match(gene)]])

Yields:

     0       1       2            3
0  chr1   69090   70008   OR4F5|CDS3
2  chr1  925917  926037  SAMD11|CDS2
3  chr1  930154  930336     SAM|CDS2

The desired output is:

     0       1       2            3
0  chr1   69090   70008   OR4F5|CDS3
3  chr1  930154  930336     SAM|CDS2

Would love to see a solution with regex since I've been trying to get my head around that but couldn't get it to work.

Upvotes: 0

Views: 146

Answers (1)

BENY
BENY

Reputation: 323266

IIUC str.split + isin

df[df['3'].str.split('|',expand=True).isin(genes).any(1)]
Out[252]: 
      0       1       2           3
0  chr1   69090   70008  OR4F5|CDS3
3  chr1  930154  930336    SAM|CDS2

Upvotes: 3

Related Questions