Curious
Curious

Reputation: 3587

python regular expression substitute

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.

score:0.86|taxid:9606(Human)|intact:EBI-999900

How to write regular expression for this in python.

Upvotes: 0

Views: 217

Answers (2)

Steven Rumbalski
Steven Rumbalski

Reputation: 45562

>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'

If there are multiple taxids, use re.findall, which returns a list of all matches:

>>> re.findall(r'taxid:(\d+)', s)
['9606']

Upvotes: 4

Joran Beasley
Joran Beasley

Reputation: 114098

for line in lines:
    match = re.match(".*\|taxid:([^|]+)\|.*",line)
    print match.groups()

Upvotes: 0

Related Questions