Reputation: 125
I am trying to extract some strings from a legal text where the patterns are repeated several times.
I am not sure I understand how the lazy quantifier (?) works. From what I read it is supposed to capture a match using as few characters as possible. However it doesnt seem to do that in my example below:
Sorry for the text in spanish, but I guess it is simple enough to follow.
...por la afirmativa.los señores jueces doctores genoud, hitters, de lazzari, roncoroni y soria, por los mismos fundamentos de la señora jueza doctora kogan, votaron la primera cuestion planteada tambien por la negativa.a la tercera cuestion planteada, la señora jueza doctora kogan dijo:..(text)...voto por la afirmativa.los señores jueces doctores genoud e hitters, por los mismos fundamentos de la señora jueza doctora kogan, votaron la tercera cuestion planteada por la afirmativa.a la tercera cuestion planteada, el señor juez doctor de lazzari dijo:...
I am trying to capture the text between the strings "los señores jueces" (line 4) and "votaron la tercera cuestion planteada por la afirmativa" . There are two matches for this pattern as the string "los señores jueces" appears twice, once at the beginning and then in line 4.
So I try to use the lazy quantifier (.*?) to get the shortest of the 2 matches:
(los señores jueces(.*?)votaron la tercera cuestion planteada por la afirmativa)
But it doesnt seem to work, it matches the longest string, starting from line 1 and not from the second (shortest) occurrence. I am testing the regex on https://regex101.com/
Apreciate any help with this.
Thanks.
Upvotes: 1
Views: 123
Reputation: 174854
Use a negative lookahead to force the regex engine to check that there isn't a string los señores jueces
present, before matching each character.
los señores jueces((?:(?!los señores jueces).)*?)votaron la tercera cuestion planteada por la afirmativa
Upvotes: 2