MSohnius
MSohnius

Reputation: 33

How does Perl regexp anchor $ actually handle a trailing newline?

I recently discovered some unexpected behaviour for the end-of-string anchor $ in a Perl Regular Expression (Perl 5.26.1 x86_64 on OpenSuse 15.2).

Supposedly, the $ refers to the end of the string, not the end of a line as it does in grep(1). Hence an explicit \n at the end of a string should have to be matched explicitly. However, the following (complete) program:

my @strings = ( 
  "hello world",
  "hello world\n",
  "hello world\t"
);
my $i = 0;
foreach (@strings) {
  $i++;
  print "$i: >>$_<<\n" if /d$/;
}

produces this output:

1: >>hello world<<
2: >>hello world
<<

i.e., the /d$/ matches not only the first of the three strings but also the second with its trailing newline. On the other hand, as expected, the regexp /d\n$/ matches the second string only, and /d\s$/ matches the second and third.

What's going on here?

Upvotes: 2

Views: 132

Answers (2)

zdim
zdim

Reputation: 66964

As stated already, the $ metacharacter indeed matches the end of string, but allowing for a newline so matching before a newline at the end of string as well. Note that it also matches before internal newlines in a multiline string with the /m global modifier

There are also ways to fine tune what exactly is matched, using these assertions

  • \z match only the end of string, even with /m flag, but not before the newline at the end

  • \Z match only the end of string, even with /m flag, and also match before the newline at the end of string. So like $ except that it never matches (before) newlines internal to a multi-line string, not even with /m

These "zero-width" assertions match a position, not characters.

Upvotes: 3

toolic
toolic

Reputation: 62236

perlre states for the $ metacharacter:

Match the end of the string
(or before newline at the end of the string;

This means that d followed immediately by \n (newline) will match the regex.

Upvotes: 1

Related Questions