Reputation: 1
I'm trying to capture the prices (either single price, or 2 prices in a price range) in the following text, which can take one of two formats below.
I'm using following regex code, however it only captures the first group/price but not the second one in Format 1? Can anyone tell how to modify the regex expression to capture both prices (if the second price exists?)?
pattern = re.compile(r'Indicative selling price.*?\$(\d{5,6}).*?\$?(\d{5,6})?.*Median sale price', re.DOTALL)
prices = pattern.findall(text)
Format 1
Indicative selling price
(*Delete single price or range as applicable)
Single price $ or range between $740000 & $760000
Median sale price
Format 2
Indicative selling price
(*Delete single price or range as applicable)
Single price $740000 or range between &
Median sale price
Upvotes: 0
Views: 339
Reputation: 19661
The main problem with your pattern is that the .*?
part after the first group is lazy and the two elements following it are optional. Therefore, .*?
doesn't have to match anything, so the final .*
comes and matches everything else.
Moreover, when you have part of the string that's optional, you shouldn't make each element of it individually optional, otherwise, your pattern wouldn't work as intended (e.g., capture one element "digits" when the other "$" is missing). To fix both problems, you should the whole optional part in a non-capturing group and make it optional: (?:.*?\$(\d{5,6}))?
.
Full pattern:
\bIndicative selling price.*?\$(\d{5,6})(?:.*?\$(\d{5,6}))?.*?Median sale price\b
Demo.
One more thing: If the two prices of a range are always separated by "&", then you should be explicit about that (i.e., use (?:[ ]&[ ]\$(\d{5,6}))?
instead of (?:.*?\$(\d{5,6}))?
).
Upvotes: 1