user3234810
user3234810

Reputation: 482

python regex for grabbing specific parts of a line

Want to go through lines in a file and grab certain parts of them.

Lines look like the below "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"

I have written the following expressions but they seem to be failing me... Firstly I am looking to grab the M10* and the K, and stick them together, taking only the first entry after the M10* (in the above example K).

Mutation = re.sub(r'.*\t(.*)\t.*:(.*)%.*', r'\1\2', line)

I want Mutation = M108K

Secondly I want to grab the percentage without the % symbol

Percentage = re.sub(r'.*\t.*\t.*:(.*)%.*', r'\1', line)

I want Percentage = 14

Not very practiced are writing expressions, these don't really work and are inefficient. Any help fixing/optimising them is appreciated.

Upvotes: 2

Views: 37

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174834

I would do all these in a single regex. .* is greedy which eats all the characters as much as possible. So you need to make it to do a non-greedy match by adding ? quantifier next to *.

>>> import re
>>> s = "2584\tM108\tK:14%" "2585\tM108\tK:14%\tN:10%"
>>> re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s)
'M108K 14'

or

>>> mutation,percentage = re.sub(r'^.*?\t(.*?)\t(.*?):(.*?)%.*', r'\1\2 \3', s).split()
>>> mutation
'M108K'
>>> percentage
'14'

Upvotes: 3

Related Questions