user2110417
user2110417

Reputation:

How to extract specific keywords within the frame and extract only that data with in the separators

I have a column data as follows:

abc|frame|gtk|enst.24|pc|hg|,abc|framex|gtk4|enst.35|pxc|h5g|,abc|frbx|hgk4|enst.23|pix|hokg|
abc|frame|gtk|enst.15|pc|hg|,abc|framex|gtk2|enst.59|pxc|h5g|,abc|frbx|hgk4|enst.18|pif|homg|
abc|frame|gtk|enst.98|pc|hg|,abc|framex|gtk1|enst.45|pxc|h5g|,abc|frbx|hgk4|enst.74|pig|hofg|
abc|frame|gtk|enst.34|pc|hg|,abc|framex|gtk1|enst.67|pxc|h5g|,abc|frbx|hgk4|enst.39|pik|hoqg|

I want to search and extract specific keywords within the frame and extract only that data with in the separators

Specific keywords are

enst.35
enst.18
enst.98
enst.63

The expected output is

abc|framex|gtk4|enst.35|pxc|h5g|
abc|frbx|hgk4|enst.18|pif|homg|
abc|frame|gtk|enst.98|pc|hg|
NA

I tried this herebut was not working effectively

Upvotes: 1

Views: 50

Answers (3)

David Erickson
David Erickson

Reputation: 16683

With str.extract and str.split(',) to take the last comma separated value:

df['Data2'] = df['Data'].str.extract('(^.*enst.35\|.+?\|.+?\||^.*enst.18\|.+?\|.+?\||^.*enst.98\|.+?\|.+?\||^.*enst.63\|.+?\|.+?\||)', expand=False).str.split(',').str[-1]

You could create a list of keywords and do list comprehension as well per another answer.

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can construct a regex pattern using the given keywords then use Series.str.findall to find all occurrences of regex in series:

import re

keywords = ['enst.35','enst.18','enst.98','enst.63']
pattern = '|'.join([rf'[^,]+{re.escape(k)}[^,]+'for k in keywords])
result = df['col'].str.findall(pattern).str.get(0)

#print(result)

0    abc|framex|gtk4|enst.35|pxc|h5g|
1     abc|frbx|hgk4|enst.18|pif|homg|
2        abc|frame|gtk|enst.98|pc|hg|
3                                 NaN
Name: col, dtype: object

You can test the regex pattern here

Upvotes: 1

user2110417
user2110417

Reputation:

You can try in bashscript as follows:

for STRING in enst.35 enst.18 enst.98 enst.63; do
  tr \, \\n < file.txt | grep "$STRING" || echo NA
done

Upvotes: 0

Related Questions