daydayup
daydayup

Reputation: 2317

regular expression: may or may not contain a string

I want to match a floating number that might be in the form of 0.1234567 or 1.23e-5 Here is my python code:

import re
def main():
    m2 = re.findall(r'\d{1,4}:[-+]?\d+\.\d+(e-\d+)?', '1:0.00003 3:0.123456 8:-0.12345')
    for svs_elem in m2:
         print svs_elem

main()

It prints blank... Based on my test, the problem was in (e-\d+)? part.

Upvotes: 1

Views: 8619

Answers (2)

chepner
chepner

Reputation: 530970

Use a non-capturing group. The matches are succeeding, but the output is the contents of the optional groups that don't actually match.

See the output when your input includes something like e-6:

>>> re.findall(r'\d{1,4}:[-+]?\d+\.\d+(e-\d+)?', '1:0.00003 3:0.123456 8:-0.12345e-6')
['', '', 'e-6']

With a non-capturing group ((?:...)):

>>> re.findall(r'\d{1,4}:[-+]?\d+\.\d+(?:e-\d+)?', '1:0.00003 3:0.123456 8:-0.12345e-6')
['1:0.00003', '3:0.123456', '8:-0.12345e-6']

Here's are some simpler examples to demonstrate how capturing groups work and how they influence the output of findall. First, no groups:

>>> re.findall("a[bc]", "ab")
["ab"]

Here, the string "ab" matched the regex, so we print everything the regex matched.

>>> re.findall("a([bc])", "ab")
["b"]

This time, we put the [bc] inside a capturing group, so even though the entire string is still matched by the regex, findall only includes the part inside the capturing group in its output.

>>> re.findall("a(?:[bc])", "ab")
["ab"]

Now, by converting the capturing group to a non-capturing group, findall again uses the match of the entire regex in its output.

>>> re.findall("a([bc])?", "a")
['']
>>> re.findall("a(?:[bc])?", "a")
['a']

In both of these final case, the regular expression as a whole matches, so the return value is a non-empty list. In the first one, the capturing group itself doesn't match any text, though, so the empty string is part of the output. In the second, we don't have a capturing group, so the match of the entire regex is used for the output.

Upvotes: 3

Ry-
Ry-

Reputation: 224886

See emphasis:

Help on function findall in module re:
findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.
    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.
    Empty matches are included in the result.

You have a group, so it’s returned instead of the entire match, but it doesn’t match in any of your cases. Make it non-capturing with (?:e-\d+):

m2 = re.findall(r'\d{1,4}:[-+]?\d+\.\d+(?:e-\d+)?', '1:0.00003 3:0.123456 8:-0.12345')

Upvotes: 4

Related Questions