Find first ReGex pattern following a different pattern

Question

Objective: find a second pattern and consider it a match only if it is the first time the pattern was seen following a different pattern.

Background:

I am using Python-2.7 Regex

I have a specific Regex match that I am having trouble with. I am trying to get the text between the square brackets in the following sample.

  Sample comments:

    [98 g/m2 Ctrl (No IP) 95 min 340oC         ]

    [    ]

I need the line:

98 g/m2 Ctrl (No IP) 95 min 340oC

The problem is the undetermined number of white-spaces, tabs, and new-lines between the search pattern Sample comments: and the match I want is giving me trouble.

Best Attempt:

I am able to match the first part easily,

match = re.findall(r'Sample comments:[.+\n+]+', string)

But I can't get the match to the length I want to grab the portion between the square brackets,

match = re.findall(r'Sample comments:[.+\n+]+$$(.+)$$', string)

My Thinking:

Is there a way to use ReGex to find the first instance of the pattern $$(.+)$$ after a match of the pattern Sample comments:? Or is there a more robust way to find the bit between the square braces in my example case.

Thanks,

Michael

Wiktor Stribiżew · Accepted Answer

I suggest using

r'Sample comments:\s*\[(.*?)\s*]'

See the regex and IDEONE demo

The point is the \s* matches zero or more whitespace, both vertical (linebreaks) and horizontal. See Python re reference:

\s
When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

Pattern details:

Sample comments: - a sequence of literal chars
\s* - 0 or more whitespaces
\[ - a literal [
(.*?) - Group 1 (returned by re.findall) capturing 0+ any chars but a newline as few as possible up to the first...
\s* - 0+ whitespaces and
] - a literal ] (note it does not have to be escaped outside the character class).

Find first ReGex pattern following a different pattern

Answers (2)

Related Questions