Reputation: 57611
I've written a parse_area
function which parses the string '1,500 sqft'
into the number 1500
, like so:
import re
import pytest
def parse_area(string):
return int(re.sub(',', '', re.search(r'[\d,]+(?= sqft)', string)[0]))
def test_parse_area():
assert parse_area('1,500 sqft') == 1500
if __name__ == "__main__":
pytest.main([__file__])
I was wondering whether it might be possible to write this function more concisely, by not capturing the ,
elements in the [\d,]
character set in the first place. I thought of using a non-capturing group, but according to https://docs.python.org/3/library/re.html the parentheses, etc. have no special meaning inside a character set.
Is this the most concise the function can be?
Upvotes: 0
Views: 51
Reputation: 57611
I considered that it might be better to do the parsing in two steps anyways in order to cover the case that no match can be found, in which case I'd like the parse_area
function to return None
instead of throwing an error. So I finally wrote it like this:
import pytest
import re
def parse_area(string):
"""Parse the string '1,500 sqft' into the integer 1500"""
m = re.search(r'[\d,]+(?= sqft)', string)
return int(m[0].replace(',', '')) if m else None
def test_parse_area():
assert parse_area('1,500 sqft') == 1500
def test_parse_area_null_case():
assert parse_area('no area here') == None
if __name__ == "__main__":
pytest.main([__file__])
and both tests pass. (Note that with the original implementation, the second test would throw a 'NoneType' not subscriptable
error).
Upvotes: 1