Reputation: 5503
I have a file I need to extract some data from, in Python. Its structure is as follows:
.I 1
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text
.I 2
.T
some multiline text
.A
some multiline text
.B
some multiline text
.W
some multiline text
As you see, there some repeating repeating patterns. I need to extract them one by one. This is my eegex:
\.I\s(\d*)\n # .I section
\.T\n([\d\D]*?) # .T section
\.A\n([\d\D]*?) # .A section
\.B\n([\d\D]*?) # .B section
\.W\n([\d\D]*) # .W section
(?=\.I\s+\d+) # look ahead section, which behaves greedy
Everything is OK, but the last section (lookahead) which is greedy. I need a non-greedy lookahead regex, but I couldn't find it.
We can apply a non-greedy behavior using *?
+?
{m,n}?
but I couldn't find such a syntax for (?=...)
When I search for a match with this regex, it only finds one match while there are two. This is because of the greedy nature of the lookahead operator. How can I have a non-greedy lookahead?
Upvotes: 4
Views: 7626
Reputation: 15433
I fail to see why the greediness of the look ahead is important, I would expect the following to work:
\.I\s(\d*)\n
\.T\n([\d\D]*?)
\.A\n([\d\D]*?)
\.B\n([\d\D]*?)
\.W\n([\d\D]*?)
(?=\.I\s+\d+|$)
Now that I think about it, I think that Wiktor Stribiżew is right. A look ahead cannot be greedy or lazy: either there is a match or there is not and what it matches does not matter.
Upvotes: 3