Reputation: 41
I have a question about writing Regex under Python.
The string is:
abc rules 2.3, 4.5, 6.7, 8.9 and def rules 3.6, 6.7, 8.9 and 10.11.
My goal is to try to use a one line regular expression to capture all the numbers.
Moreover, I want to put the number into different groups. 2.3, 4.5, 6.7, 8.9
should be under group abc rules
and 3.6, 6.7, 8.9 and 10.11
will be under def rules
.
I have try to use the regex:
(?<=abc rules) \d{1,2}.\d{1,2}
to capture all the numbers after abc rules, but I could only get the first numbers.
How can I achieve the goal?
Thanks everyone!
Upvotes: 3
Views: 1619
Reputation: 627488
You can use
import re
rx = r"\b(?:abc|def)\s+rules\s+(\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?\d+)*)"
s = "abc rules 2.3, 4.5, 6.7, 8.9 and def rules 3.6, 6.7, 8.9 and 10.11."
print([re.split(r'\s*(?:,|\band\b)\s*', x) for x in re.findall(rx, s)])
# => [['2.3', '4.5', '6.7', '8.9'], ['3.6', '6.7', '8.9', '10.11']]
See the Python demo
The point is, you may match the substrings with numbers, capture the number only parts, and then split those latter ones with \s*(?:,|\band\b)\s*
regex.
This matches all the substrings:
\b(?:abc|def)\s+rules\s+(\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?\d+)*)
See the regex demo
Details:
\b
- a word boundary(?:abc|def)
- either abc
or def
\s+
- 1 or more whitespacesrules
- a substring rules
\s+
- 1 or more whitespaces(\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?\d+)*)
- Group 1 capturing:
\d*\.*?\d+
- an int or float number(?:(?:,|\s*and)\s*\d*\.*?\d+)*
- zero or more sequences of:
(?:,|\s*and)
- ,
or 0+ whitespaces and then and
\s*
- 0+ whitespaces\d*\.*?\d+
- an int or float numberThe \s*(?:,|\band\b)\s*
regex matches a comma or a whole word and
enclosed with 0+ whitespaces.
Upvotes: 4