steven smith
steven smith

Reputation: 1577

Perl regex vs. Python regex non-capture seems to work differently

I'm trying to capture the numbers in a decimal inches statement without catching the ". An expression that works well in Perl seems to fail in Python and I can't see why.

In the following two expressions I expect to see 1 and 1.5 but in Python see instead 1" and 1.5" and I expect them to work the same. What am I missing?

Perl:

  DB<15> x '1"' =~ m{^(?:(\d+(?:\.\d+)*)")}  
0  1
  DB<16> x '1.5"' =~ m{^(?:(\d+(?:\.\d+)*)")}
0  1.5

Python:

>>> re.search(r'^(?:(\d+(?:\.\d+)*)")', '1"').group()
'1"'
>>> re.search(r'^(?:(\d+(?:\.\d+)*)")', '1.5"').group()
'1.5"'

Ultimately I was hoping to use an expression like:^(?:(\d+)\')|(?:(\d+(?:.\d+)*)") to match either 1' or 1" or 1.5" and by the location of the match, tell which expression worked. hwnd pointed out the 'findall' which I had previously overlooked so I expect my solution will look something like:

>>> re.findall(r'^(?:(\d+)\')|(?:(\d+(?:\.\d+)*)")', '1\'')
[('1', '')]
>>> re.findall(r'(?:(\d+)\')|(?:(\d+(?:\.\d+)*)")', '1\' 1" 1.5"')
[('1', ''), ('', '1'), ('', '1.5')]

Here is another interesting possibility using finditer/groupdict/comprehension:

>>> [m.groupdict() for m in re.finditer(r'(?P<feet>(\d+)\')|(?P<inches>(\d+(?:\.\d+)*)")', '1\' 1" 1.5"')]
[{'feet': "1'", 'inches': None},
 {'feet': None, 'inches': '1"'},
 {'feet': None, 'inches': '1.5"'}]

Thank you all for another enlightening trip into Python.

Upvotes: 2

Views: 269

Answers (3)

hwnd
hwnd

Reputation: 70732

You can easily do..

import re

string = 'I have values "1" and "1.5" also "12.555"'
m = re.findall(r'\"(\d+|\d+\.\d+)\"', string)
print ", " . join(m)

Output:

1, 1.5, 12.555

Upvotes: 1

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118148

Try:

re.search(r'^(?:(\d+(?:\.\d+)*)")', '1.5"').group(1)

See re:

group([group1, ...])

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. (emphasis mine)

Now, you can use named capture groups both in Perl (unless you are stuck with a very old version) and Python.

So, I would actually recommend:

>>> re.search(r'^(?:(?P<inches>\d+(?:\.\d+){0,1})")', '1.5"').groupdict()['inches']
'1.5'

Upvotes: 5

Paul Evans
Paul Evans

Reputation: 27577

You're putting the trailing " in the group. Instead, try:

re.search(r'^(?:(\d+(?:\.\d+)*))(?=")', '1.5"').group()

Upvotes: 0

Related Questions