Reputation: 103
I have a string:
foo bar $ 123.456 bar foo $ 652 $ 1.255.250 bar $ 2.000 foo badword $ 300.000 foo bar $ 123 badword2 $ 400
And I want to match all the prices, except the ones which follows a "badword".
Match:
123.456
652
1.255.250
2.000
123
Do not match:
badword $ 300.000
badword2 $ 400
I'm developing in Python 3.6 and using (\d+).(\d+)
to capture the prices so far.
Upvotes: 0
Views: 61
Reputation: 163277
The pattern (\d+).(\d+)
will capture one or more digits in capture group 1 and group 2 and the dot would match any character. That would also match 123a456
One option to capture the prices is to match what you do not want (?:badword|badword2) \$ \d+(?:\.\d+)*
and then capture in a group \$ (\d+(?:\.\d+)*)
what you do want using an alternation:
(?:badword|badword2) \$ \d+(?:\.\d+)*|\$ (\d+(?:\.\d+)*)
That would match
(?:
Non capturing group
badword|badword2
Match bad words)
Close non capturing group\$
Match whitespace $ whitespace\d+(?:\.\d+)*
Match 1 or more digits followed by (a dot and 1 or more digits) repeated 0 or more times|
Or\$
Match whitespace $ whitespace(
Capturing group (Your digits will be in here)
\d+(?:\.\d+)*
Match 1 or more digits followed by (a dot and 1 or more digits) repeated 0 or more times)
Close capturing groupYou can extend the alternation with the badwords you want to add.
Upvotes: 2
Reputation: 22817
Personally, I'd use this more pythonic approach using list comprehension. It basically extracts the price parts (potential words, price) into groups, then removes the matches whose word group contains badword
, then prints only the price value.
import re
s = "foo bar $ 123.456 bar foo $ 652 $ 1.255.250 bar $ 2.000 foo badword $ 300.000 foo bar $ 123 badword2 $ 400"
r = re.compile(r"([^$]+)\$\s*(\d{1,3}(?:\.\d{3})*)")
print([x[1] for x in r.findall(s) if "badword" not in x[0]])
The regex used in the code above is:
([^$]+)\$\s*(\d{1,3}(?:\.\d{3})*)
The following regular expression may also be used:
([^$]+)\$\s*([\d.]+)
Upvotes: 0