Reputation: 120
I'm writing a python regex formula that parses the content of a heading, however the greedy quantifier is not working well, and the non greedy quantifier is not working at all.
My string is
Step 1 Introduce The Assets:
Step2 Verifying the Assets
Step 3Making sure all the data is in the right place:
What I'm trying to do is extract the step number, and the heading, excluding the :
.
Now I've tried multiple regex string and came up with these 2:
r1 = r"Step ?([0-9]+) ?(.*) ?:?"
r2 = r"Step ?([0-9]+) ?(.*?) ?:?"
r1 is capturing the step number, but is also capturing :
at the end.
r2 is capturing the step number, and ''
. I'm not sure how to handle the case where there is a .*
followed by a string.
Necessary Edit:
The heading might contain :
inside the string, I just want to ignore the trailing one. I know I can strip(':')
but I want to understand what I'm doing wrong.
Upvotes: 0
Views: 44
Reputation: 163517
You can write the pattern using a negated character class without the non greedy and optional parts using a negated character class:
\bStep ?(\d+) ?([^:\n]+)
\bStep ?
Match the word Step
and optional space(\d+) ?
Capture 1+ digits in group 1 followed by matching an optional space([^:\n]+)
Capture 1+ chars other than :
or a newline in group 2If the colon has to be at the end of the string:
\bStep ?(\d+) ?([^:\n]+):?$
Upvotes: 2