Reputation: 482
Want to go through lines in a file and grab certain parts of them.
Lines look like the below "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"
I have written the following expressions but they seem to be failing me... Firstly I am looking to grab the M10* and the K, and stick them together, taking only the first entry after the M10* (in the above example K).
Mutation = re.sub(r'.*\t(.*)\t.*:(.*)%.*', r'\1\2', line)
I want Mutation = M108K
Secondly I want to grab the percentage without the % symbol
Percentage = re.sub(r'.*\t.*\t.*:(.*)%.*', r'\1', line)
I want Percentage = 14
Not very practiced are writing expressions, these don't really work and are inefficient. Any help fixing/optimising them is appreciated.
Upvotes: 2
Views: 37
Reputation: 174834
I would do all these in a single regex. .*
is greedy which eats all the characters as much as possible. So you need to make it to do a non-greedy match by adding ?
quantifier next to *
.
>>> import re
>>> s = "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"
>>> re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s)
'M108K 14'
or
>>> mutation,percentage = re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s).split()
>>> mutation
'M108K'
>>> percentage
'14'
Upvotes: 3