Yicheng Wang
Yicheng Wang

Reputation: 41

Regex match multiple numbers after keywords

I have a question about writing Regex under Python.

The string is:

abc rules 2.3, 4.5, 6.7, 8.9 and def rules 3.6, 6.7, 8.9 and 10.11.

My goal is to try to use a one line regular expression to capture all the numbers.

Moreover, I want to put the number into different groups. 2.3, 4.5, 6.7, 8.9 should be under group abc rules and 3.6, 6.7, 8.9 and 10.11 will be under def rules.

I have try to use the regex: (?<=abc rules) \d{1,2}.\d{1,2} to capture all the numbers after abc rules, but I could only get the first numbers.

How can I achieve the goal?

Thanks everyone!

Upvotes: 3

Views: 1619

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

You can use

import re
rx = r"\b(?:abc|def)\s+rules\s+(\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?\d+)*)"
s = "abc rules 2.3, 4.5, 6.7, 8.9 and def rules 3.6, 6.7, 8.9 and 10.11."
print([re.split(r'\s*(?:,|\band\b)\s*', x) for x in re.findall(rx, s)])
# => [['2.3', '4.5', '6.7', '8.9'], ['3.6', '6.7', '8.9', '10.11']]

See the Python demo

The point is, you may match the substrings with numbers, capture the number only parts, and then split those latter ones with \s*(?:,|\band\b)\s* regex.

This matches all the substrings:

\b(?:abc|def)\s+rules\s+(\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?‌​\d+)*)

See the regex demo

Details:

  • \b - a word boundary
  • (?:abc|def) - either abc or def
  • \s+ - 1 or more whitespaces
  • rules - a substring rules
  • \s+ - 1 or more whitespaces
  • (\d*\.*?\d+(?:(?:,|\s*and)\s*\d*\.*?‌​\d+)*) - Group 1 capturing:
    • \d*\.*?\d+ - an int or float number
    • (?:(?:,|\s*and)\s*\d*\.*?‌​\d+)* - zero or more sequences of:
      • (?:,|\s*and) - , or 0+ whitespaces and then and
      • \s* - 0+ whitespaces
      • \d*\.*?‌​\d+ - an int or float number

The \s*(?:,|\band\b)\s* regex matches a comma or a whole word and enclosed with 0+ whitespaces.

Upvotes: 4

Related Questions