Michael Molter
Michael Molter

Reputation: 1326

Find first ReGex pattern following a different pattern

Objective: find a second pattern and consider it a match only if it is the first time the pattern was seen following a different pattern.

Background:

I am using Python-2.7 Regex

I have a specific Regex match that I am having trouble with. I am trying to get the text between the square brackets in the following sample.

  Sample comments:

    [98 g/m2 Ctrl (No IP) 95 min 340oC         ]

    [    ]

I need the line:

98 g/m2 Ctrl (No IP) 95 min 340oC

The problem is the undetermined number of white-spaces, tabs, and new-lines between the search pattern Sample comments: and the match I want is giving me trouble.

Best Attempt:

I am able to match the first part easily,

match = re.findall(r'Sample comments:[.+\n+]+', string)

But I can't get the match to the length I want to grab the portion between the square brackets,

match = re.findall(r'Sample comments:[.+\n+]+\[(.+)\]', string)

My Thinking:

Is there a way to use ReGex to find the first instance of the pattern \[(.+)\] after a match of the pattern Sample comments:? Or is there a more robust way to find the bit between the square braces in my example case.

Thanks,

Michael

Upvotes: 0

Views: 122

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

I suggest using

r'Sample comments:\s*\[(.*?)\s*]'

See the regex and IDEONE demo

The point is the \s* matches zero or more whitespace, both vertical (linebreaks) and horizontal. See Python re reference:

\s
When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

Pattern details:

  • Sample comments: - a sequence of literal chars
  • \s* - 0 or more whitespaces
  • \[ - a literal [
  • (.*?) - Group 1 (returned by re.findall) capturing 0+ any chars but a newline as few as possible up to the first...
  • \s* - 0+ whitespaces and
  • ] - a literal ] (note it does not have to be escaped outside the character class).

Upvotes: 3

Pierre
Pierre

Reputation: 6237

Not sure if I understand your problem correctly, but re.findall('Sample comments:[^\\[]*\\[([^\\]]*)\\]', string) seems to work.

Or maybe re.findall('Sample comments:[^\\[]*\\[[ \t]*([^\\]]*?)[ \t]*\\]', string) if you want to strip the final spaces from your line?

Upvotes: 0

Related Questions