Reputation: 1326
Objective: find a second pattern and consider it a match only if it is the first time the pattern was seen following a different pattern.
Background:
I am using Python-2.7 Regex
I have a specific Regex match that I am having trouble with. I am trying to get the text between the square brackets in the following sample.
Sample comments:
[98 g/m2 Ctrl (No IP) 95 min 340oC ]
[ ]
I need the line:
98 g/m2 Ctrl (No IP) 95 min 340oC
The problem is the undetermined number of white-spaces, tabs, and new-lines between the search pattern Sample comments:
and the match I want is giving me trouble.
Best Attempt:
I am able to match the first part easily,
match = re.findall(r'Sample comments:[.+\n+]+', string)
But I can't get the match to the length I want to grab the portion between the square brackets,
match = re.findall(r'Sample comments:[.+\n+]+\[(.+)\]', string)
My Thinking:
Is there a way to use ReGex to find the first instance of the pattern \[(.+)\]
after a match of the pattern Sample comments:
? Or is there a more robust way to find the bit between the square braces in my example case.
Thanks,
Michael
Upvotes: 0
Views: 122
Reputation: 626853
I suggest using
r'Sample comments:\s*\[(.*?)\s*]'
See the regex and IDEONE demo
The point is the \s*
matches zero or more whitespace, both vertical (linebreaks) and horizontal. See Python re
reference:
\s
When theUNICODE
flag is not specified, it matches any whitespace character, this is equivalent to the set[ \t\n\r\f\v]
. TheLOCALE
flag has no extra effect on matching of the space. IfUNICODE
is set, this will match the characters[ \t\n\r\f\v]
plus whatever is classified as space in the Unicode character properties database.
Pattern details:
Sample comments:
- a sequence of literal chars\s*
- 0 or more whitespaces\[
- a literal [
(.*?)
- Group 1 (returned by re.findall
) capturing 0+ any chars but a newline as few as possible up to the first...\s*
- 0+ whitespaces and]
- a literal ]
(note it does not have to be escaped outside the character class).Upvotes: 3
Reputation: 6237
Not sure if I understand your problem correctly, but re.findall('Sample comments:[^\\[]*\\[([^\\]]*)\\]', string)
seems to work.
Or maybe re.findall('Sample comments:[^\\[]*\\[[ \t]*([^\\]]*?)[ \t]*\\]', string)
if you want to strip the final spaces from your line?
Upvotes: 0