python regex for grabbing specific parts of a line

Question

Want to go through lines in a file and grab certain parts of them.

Lines look like the below "2584 M108 K:14%" "2585 M108 K:14% N:10%"

I have written the following expressions but they seem to be failing me... Firstly I am looking to grab the M10* and the K, and stick them together, taking only the first entry after the M10* (in the above example K).

Mutation = re.sub(r'.*	(.*)	.*:(.*)%.*', r'\1\2', line)

I want Mutation = M108K

Secondly I want to grab the percentage without the % symbol

Percentage = re.sub(r'.*	.*	.*:(.*)%.*', r'\1', line)

I want Percentage = 14

Not very practiced are writing expressions, these don't really work and are inefficient. Any help fixing/optimising them is appreciated.

Avinash Raj · Accepted Answer

I would do all these in a single regex. .* is greedy which eats all the characters as much as possible. So you need to make it to do a non-greedy match by adding ? quantifier next to *.

>>> import re
>>> s = "2584	M108	K:14%" "2585	M108	K:14%	N:10%"
>>> re.sub(r'^.*?	(.*?)	(.*?):(.*?)%.*', r'\1\2 \3', s)
'M108K 14'

or

>>> mutation,percentage = re.sub(r'^.*?	(.*?)	(.*?):(.*?)%.*', r'\1\2 \3', s).split()
>>> mutation
'M108K'
>>> percentage
'14'

python regex for grabbing specific parts of a line

Answers (1)

Related Questions