Fartab
Fartab

Reputation: 5503

Non-greedy lookahead regex

I have a file I need to extract some data from, in Python. Its structure is as follows:

.I 1
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text
.I 2
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text

As you see, there some repeating repeating patterns. I need to extract them one by one. This is my eegex:

\.I\s(\d*)\n       # .I section
\.T\n([\d\D]*?)    # .T section
\.A\n([\d\D]*?)    # .A section
\.B\n([\d\D]*?)    # .B section
\.W\n([\d\D]*)     # .W section
(?=\.I\s+\d+)     # look ahead section, which behaves greedy

Everything is OK, but the last section (lookahead) which is greedy. I need a non-greedy lookahead regex, but I couldn't find it.

We can apply a non-greedy behavior using *? +? {m,n}? but I couldn't find such a syntax for (?=...)

When I search for a match with this regex, it only finds one match while there are two. This is because of the greedy nature of the lookahead operator. How can I have a non-greedy lookahead?

Upvotes: 4

Views: 7626

Answers (1)

Julien Spronck
Julien Spronck

Reputation: 15433

I fail to see why the greediness of the look ahead is important, I would expect the following to work:

\.I\s(\d*)\n
\.T\n([\d\D]*?)
\.A\n([\d\D]*?)
\.B\n([\d\D]*?)
\.W\n([\d\D]*?)
(?=\.I\s+\d+|$)

Now that I think about it, I think that Wiktor Stribiżew is right. A look ahead cannot be greedy or lazy: either there is a match or there is not and what it matches does not matter.

Upvotes: 3

Related Questions