Ben Kean
Ben Kean

Reputation: 306

Unexpected Behavior of Regex in Perl with Lookahead/Lookbehind

On https://regexr.com/67r1h using the PCRE engine my regex is correctly matching all un-escaped double quote characters.

/(?<!^|,|(?<!^|,)")"(?!,|$|"(?!,|$))/gm

test data:

one,"George "Georgie"","Washington"
""two"",""Johnny" John","Adams"
"""three""","""Tommy"" Thomas ""BigT""","Jefferson"
"four","Sinead","O"Connor"

The expected output is that lines 1, 2, and 4 match. Line 3 should not match because all double quotes are either quoted identifiers for the field or properly escaped double quotes.

However, when executing the following command in Bash using perl v5.30 and v5.34, I do not get any matches:

echo 'one,"George "Georgie"","Washington"' | perl -ne '/(?<!^|,|(?<!^|,)")"(?!,|$|"(?!,|$))/gm && print'

output:

Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?<!^|,|(?<!^|,)")"(?!,|$|"(?!,|$)) <-- HERE / at -e line 1.

I'm at a loss as to why perl will not match this regex. Is there something wrong in my bash command or in my regex? Is the PCRE implementation able to handle nested lookahead/behinds while the perl implementation is not able to?

Upvotes: 1

Views: 113

Answers (1)

ikegami
ikegami

Reputation: 385764

This is a bug in Perl, probably in the experimental variable lookbehind feature.

Simpler case:

$ echo ' "G' | perl -M5.010 -ne'say /(?<!^|,|(?<!^|,)")"(?!,|$|"(?!,|$))/ || 0'
Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?<!^|,|(?<!^|,)")"(?!,|$|"(?!,|$)) <-- HERE / at -e line 1.
0

Tested using the latest version (5.34.0).

Upvotes: 2

Related Questions