Reputation: 167
I have a pandas df:
0 1 2 3
0 chr1 69090 70008 OR4F5|CDS3
1 chr1 450739 451678 OR4F29|CDS1
2 chr1 925917 926037 SAMD11|CDS2
3 chr1 930154 930336 SAM|CDS2
4 chr1 940555 947899 ERSAM|CDS1
5 chr1 944686 944806 NOC2L|CDS3
6 chr1 945041 945161 NOC2L|CDS3
and a list:
genes = ["OR4F5", "SAM"]
How can I extract the rows with an exact match in the list?
out = pd.DataFrame()
for gene in genes:
out = pd.concat([out, df[df[3].str.match(gene)]])
Yields:
0 1 2 3
0 chr1 69090 70008 OR4F5|CDS3
2 chr1 925917 926037 SAMD11|CDS2
3 chr1 930154 930336 SAM|CDS2
The desired output is:
0 1 2 3
0 chr1 69090 70008 OR4F5|CDS3
3 chr1 930154 930336 SAM|CDS2
Would love to see a solution with regex since I've been trying to get my head around that but couldn't get it to work.
Upvotes: 0
Views: 146
Reputation: 323266
IIUC str.split
+ isin
df[df['3'].str.split('|',expand=True).isin(genes).any(1)]
Out[252]:
0 1 2 3
0 chr1 69090 70008 OR4F5|CDS3
3 chr1 930154 930336 SAM|CDS2
Upvotes: 3