Reputation: 626
I have a complicated text file, here is part of it:
& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*}{$<$ 0.001}\\
I'm interested in the numbers after the {*}
. Here is what I tried with no luck:
import re
m = re.findall(r'{\*}{(.+)}', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
However, I get the following result:
['52.7} & \\multirow{2}{*}{3} & \\multirow{2}{*}{$<$ 0.001']
I tried many other combinations but I either get the first number (e.g 52.7), or the middle number (3) or the above. How can I get 52.7, 3, $<$ 0.001
in a group (list).
Upvotes: 1
Views: 54
Reputation: 8432
m = re.findall(r'({\*}{([\d|\.?]+?)})+', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
[('{*}{52.7}', '52.7'), ('{*}{3}', '3')]
m = re.findall(r'{\*}{([\d|\.?]+?)}+', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
['52.7', '3']
m = re.findall(r'{\*}{(.*?)}', '& \multirow{2}{*}{52.7} & \multirow{2}{*}{3} & \multirow{2}{*} $<$ 0.001}\\')
['52.7', '3', '$<$ 0.001']
Upvotes: 1
Reputation: 745
That's because by default +
and *
operators are greedy. Use non-greedy modification instead:
{\*}{(.+?)}
Reference: http://www.regular-expressions.info/repeat.html ("Watch Out for The Greediness!" section)
Upvotes: 3
Reputation: 2166
use the following regex expression:
\{\*\}\{(.*?)\}
you should escape all special characters with backslash \
and use non-greedy wildcard .*?
in a subclass for result set.
Upvotes: 1