Reputation: 1693
Hopefully someone can help, I'm trying to use a regular expression to extract something from a string that occurs after a pattern, but it's not working and I'm not sure why. The regex works fine in linux...
import re
s = "GeneID:5408878;gbkey=CDS;product=carboxynorspermidinedecarboxylase;protein_id=YP_001405731.1"
>>> x = re.search(r'(?<=protein_id=)[^;]*',s)
>>> print(x)
<_sre.SRE_Match object at 0x000000000345B7E8>
Upvotes: 1
Views: 237
Reputation: 142176
You should probably think about re-writing your regex so that you find all pairs so you don't have to muck around with specific groups and hard-coded look behinds...
import re
kv = dict(re.findall('(\w+)=([^;]+)', s))
# {'gbkey': 'CDS', 'product': 'carboxynorspermidinedecarboxylase', 'protein_id': 'YP_001405731.1'}
print kv['protein_id']
# YP_001405731.1
Upvotes: 4