Is it possible to not capture certain elements of a character set in a Python regular expression?

Question

I've written a parse_area function which parses the string '1,500 sqft' into the number 1500, like so:

import re
import pytest

def parse_area(string):
    return int(re.sub(',', '', re.search(r'[\d,]+(?= sqft)', string)[0]))

def test_parse_area():
    assert parse_area('1,500 sqft') == 1500


if __name__ == "__main__":
    pytest.main([__file__])

I was wondering whether it might be possible to write this function more concisely, by not capturing the , elements in the [\d,] character set in the first place. I thought of using a non-capturing group, but according to https://docs.python.org/3/library/re.html the parentheses, etc. have no special meaning inside a character set.

Is this the most concise the function can be?

Kurt Peek · Accepted Answer

I considered that it might be better to do the parsing in two steps anyways in order to cover the case that no match can be found, in which case I'd like the parse_area function to return None instead of throwing an error. So I finally wrote it like this:

import pytest
import re


def parse_area(string):
    """Parse the string '1,500 sqft' into the integer 1500"""
    m = re.search(r'[\d,]+(?= sqft)', string)
    return int(m[0].replace(',', '')) if m else None

def test_parse_area():
    assert parse_area('1,500 sqft') == 1500

def test_parse_area_null_case():
    assert parse_area('no area here') == None


if __name__ == "__main__":
    pytest.main([__file__])

and both tests pass. (Note that with the original implementation, the second test would throw a 'NoneType' not subscriptable error).

Is it possible to not capture certain elements of a character set in a Python regular expression?

Answers (1)

Related Questions