ebnius
ebnius

Reputation: 930

Lazy match for a pattern

Given string 1.blah blah2.yada yada I'd like to extract 1.blah blah and 2.yada yada. I tried this \d\..+ but that matches the entire string. \d\..+? matches 1.b and 2.y. I just need to match the pattern lazily. Any ideas?

Upvotes: 1

Views: 170

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

The .+ at the end of the pattern matches all 1+ chars other than line break chars up to the string/line end. .+? at the end of the pattern only matches 1 char (but it is required) since +? is a lazy quantifier requiring only 1 char to be present.

You may use

\d+\..*?(?=\d+\.|$)

See the regex demo. Add re.DOTALL modifier if there can be line breaks inside the string.

Details

  • \d+ - 1+ digits
  • \. - a dot
  • .*? - any 0+ chars other than line break chars (if re.DOTALL is used, even including line break chars), as few as possible, up to (but excluding) the first occurrence of...
  • (?=\d+\.|$) - (a positive lookahead matching either of the two alternatives:) 1+ digits and then . or end of string.

Python demo:

import re
rx = r"\d+\..*?(?=\d+\.|$)"
s = "1.blah blah2.yada yada3.yadddaaa"
print(re.findall(rx, s))
# => ['1.blah blah', '2.yada yada', '3.yadddaaa']

See the Python demo.

Upvotes: 1

Related Questions