Non-greedy lookahead regex

Question

I have a file I need to extract some data from, in Python. Its structure is as follows:

.I 1
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text
.I 2
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text

As you see, there some repeating repeating patterns. I need to extract them one by one. This is my eegex:

\.I\s(\d*)
       # .I section
\.T
([\d\D]*?)    # .T section
\.A
([\d\D]*?)    # .A section
\.B
([\d\D]*?)    # .B section
\.W
([\d\D]*)     # .W section
(?=\.I\s+\d+)     # look ahead section, which behaves greedy

Everything is OK, but the last section (lookahead) which is greedy. I need a non-greedy lookahead regex, but I couldn't find it.

We can apply a non-greedy behavior using *? +? {m,n}? but I couldn't find such a syntax for (?=...)

When I search for a match with this regex, it only finds one match while there are two. This is because of the greedy nature of the lookahead operator. How can I have a non-greedy lookahead?

Julien Spronck · Accepted Answer

I fail to see why the greediness of the look ahead is important, I would expect the following to work:

\.I\s(\d*)

\.T
([\d\D]*?)
\.A
([\d\D]*?)
\.B
([\d\D]*?)
\.W
([\d\D]*?)
(?=\.I\s+\d+|$)

Now that I think about it, I think that Wiktor Stribiżew is right. A look ahead cannot be greedy or lazy: either there is a match or there is not and what it matches does not matter.

Non-greedy lookahead regex

Answers (1)

Related Questions