Reputation: 3587
I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.
score:0.86|taxid:9606(Human)|intact:EBI-999900
How to write regular expression for this in python.
Upvotes: 0
Views: 217
Reputation: 45562
>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'
If there are multiple taxids, use re.findall
, which returns a list of all matches:
>>> re.findall(r'taxid:(\d+)', s)
['9606']
Upvotes: 4
Reputation: 114098
for line in lines:
match = re.match(".*\|taxid:([^|]+)\|.*",line)
print match.groups()
Upvotes: 0