abdulhaq-e
abdulhaq-e

Reputation: 626

what regex expression in python for this text

I have a complicated text file, here is part of it:

& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*}{$<$ 0.001}\\

I'm interested in the numbers after the {*}. Here is what I tried with no luck:

import re
m = re.findall(r'{\*}{(.+)}', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')

However, I get the following result:

['52.7} & \\multirow{2}{*}{3} & \\multirow{2}{*}{$<$ 0.001']

I tried many other combinations but I either get the first number (e.g 52.7), or the middle number (3) or the above. How can I get 52.7, 3, $<$ 0.001 in a group (list).

Upvotes: 1

Views: 54

Answers (3)

ssedano
ssedano

Reputation: 8432

m = re.findall(r'({\*}{([\d|\.?]+?)})+', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
[('{*}{52.7}', '52.7'), ('{*}{3}', '3')]

m = re.findall(r'{\*}{([\d|\.?]+?)}+', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
['52.7', '3']

m = re.findall(r'{\*}{(.*?)}', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
['52.7', '3', '$<$ 0.001']

Upvotes: 1

ualinker
ualinker

Reputation: 745

That's because by default + and * operators are greedy. Use non-greedy modification instead:

{\*}{(.+?)}

Reference: http://www.regular-expressions.info/repeat.html ("Watch Out for The Greediness!" section)

Upvotes: 3

Justin McDonald
Justin McDonald

Reputation: 2166

use the following regex expression:

\{\*\}\{(.*?)\}

you should escape all special characters with backslash \ and use non-greedy wildcard .*? in a subclass for result set.

Upvotes: 1

Related Questions