osiv
osiv

Reputation: 77

Why does a match against regex $ return 1 when the input string contains a newline?

Why does the command

perl -e "print qq/a\n/ =~ /$/"

print 1?

As far as I know, Perl considers $ as the position both before \n as well as the position at the end of the whole string in multi-line mode, which is the default (no modifier is applied).

Upvotes: 1

Views: 258

Answers (3)

brian d foy
brian d foy

Reputation: 132913

The match operator returns 1 as the true value because the pattern matched. The print outputs that value.

The $ is an anchor, which is a specific sort of zero-width assertion. It matches a condition in the pattern but consumes no text. Since you have nothing else in the pattern, the /$/ matches any target string including the empty string. It will always return true.

The $ is the end-of-line anchor, as documented in perlre. The $ allows a vestigial newline at the end, so both of these can match:

"a"   =~ /a$/
"a\n" =~ /a$/

Without the /m regex modifier, the end of the line is the end of the string. But, with that modifier it can match before any newline in the string:

"a\n" =~ /a$b/m

You might get this behavior even if you don't see it attached to the particular match operator since people can set default match flags:

use re '/m'; # applies to all in lexical scope

Over-enthusiastic fans of Perl Best Practices like to make a trio of pattern changing commands the default (often not auditing every regex it affects):

use re '/msx'

There's another anchor, the end-of-string anchor \Z, that also allows a trailing newline. If you don't want to allow a newline, you can use the lowercase \z to mean the absolute end of the string. These are not affected by regex flags.

Upvotes: 2

Lee Duhem
Lee Duhem

Reputation: 15121

Here is how the following command works:

perl -e "print qq/a\n/ =~ /$/"
  1. print provides a list context,and
  2. in list context, m// will return "(1)" for success, if there is no parentheses in the pattern, and
  3. $ match the end of the string (or before newline at the end of the string).

Upvotes: 0

simbabque
simbabque

Reputation: 54371

It prints 1 because there is a match. An ordinary pattern match like m// stops after the first match, and returns 1 because that's a true value.

According to that explanation, it doesn't matter if it matches your "a\n" after the a or after the \n character. In either case, there's a match, so it's true, and that's represented by 1.

You can take a deeper look with use re 'debug'.

Compiling REx "$"
Final program:
   1: EOL (2)
   2: END (0)
anchored ""$ at 0 minlen 0 
Matching REx "$" against "a%n"
   1 <a> <%n>                |  1:EOL(2)
   1 <a> <%n>                |  2:END(0)
Match successful!
Freeing REx: "$"

That's all there is to it.

Upvotes: 4

Related Questions