sysuser
sysuser

Reputation: 1100

python RE: Non greedy matches, repitition and grouping

I am trying to match repeated line patterns using python RE

input_string:

start_of_line: x
line 1
line 2
start_of_line: y
line 1
line 2
line 3
start_of_line: z
line 1

Basically I want to extract strings in a loop (each string starting from start_of_line till all characters before the next start_of_line)

I can easily solve this using a for loop, but wondering if there is a python RE to do this, tried my best but getting stuck with the grouping part.

The closest thing which resembles like a solution to me is

pattern= re.compile(r"start_of_line:.*?", re.DOTALL)
for match in re.findall(pattern, input_string):
    print "Match =", match

But it prints

Match = start_of_line:
Match = start_of_line:
Match = start_of_line:

If I do anything else to group, I lose the matches.

Upvotes: 4

Views: 533

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

To do this with a regex, you must use a lookahead test:

r"start_of_line:.*?(?=start_of_line|$)"

otherwhise, since you use a lazy quantifier ( *? ), you will obtain the shortest match possible, i.e. nothing after start_of_line:

Another way:

r"start_of_line:(?:[^\n]+|\n(?!start_of_line:))*"

Here i use a character class containing all but a newline (\n) repeated one or more times. When the regex engine find a newline it tests if start_of_line: doesn't follow. I repeat the group zero or more times.

This pattern is more efficient than the first because the lookahead is performed only when a newline is encounter (vs on each characters)

Upvotes: 3

Related Questions