Regex to match specific strings but only first string on new line

Question

Using Python regex, I'm trying to scrape some Behat scenarios. Here is a regex: https://regex101.com/r/EGdK3O/1 (Scenario:([\s\S]*?)(And|When|Then|Given)).

The current version of my code is items = re.findall(r'Scenario:([\s\S]*?)(And|When|Then|Given|#)', contents, re.MULTILINE). This works, except when one of these strings is in the scenario.

What I'm having trouble figuring out is how to only match (And|When|Then|Given) when the string occurrence is the first string on a new line. Even better would be if I can match with a new line that has a tab or number of spaces.

The ultimate goal here is to get the Scenario description but not the steps.

The fourth bird · Accepted Answer

You could match Scenario followed by a capturing group which will match until the end of the string without matching a newline.

Then use a single capturing group to repeat matching the lines that do not start with (And|When|Then|Given) prepended with 1+ tabs or spaces and finally match the line that contains one of the options after the capturing group.

\bScenario:(.*(?:
?
(?![ 	]+(And|[WT]hen|Given)).*)*)
?
[ 	]+(?:And|[WT]hen|Given)

\bScenario: Match Scenario: prepended by a word boundary
( Capture group 1
- .* Match any char except a newline
- (?: Non capturing group
  - ? Match a newline
  - (?! Negative lookahead, if what is on the right is not [ ]+(And|[WT]hen|Given) Match 1+ spaces or tabs and 1 of the options
  - ).* Close group and match 0+ times any char except a newline
- )* Close group and repeat 0+ times
) Close capture group
? [ ]+ Match a newline and 1+ spaces or tabs
(?:And|[WT]hen|Given) Match any of the listed

Regex demo

Regex to match specific strings but only first string on new line

Answers (2)

Related Questions