line anchor behavior with perl regex

Question

I recently wrote a little Perl script to trim whitespace from the end of lines and ran into unexpected behavior. I decided that Perl must include line-end characters when breaking up lines, so tested that theory and got even more unexpected behavior. I do not should either match \s+$ or t$...Not both. Very confused. Can anyone enlighten me?

£ cat example
I have space after me
I do not
£ perl -ne 'print if /\s+$/' example
I have a space after me
I do not
£ perl -ne 'print if /t$/' example
I do not
£

PCRE tester gives expected results. I've also tried the /m suffix with no change in behavior.

edit. for completeness:

£ perl -ne 'print if /e$/' example
£

Expected behavior from perl -ne 'print if...' was the same as grep -P:

£ grep -P '\s+$' example
I have a space after me
£

Can repro under Ubuntu 16.04 perl v5.22.1 (both 60 and 68 patch version) and MINGW perl v5.26.1.

Eugen Konkov · Accepted Answer

You see your current behavior because in example file the second line has character at the end. is the space which matched by \s

perlretut

no modifiers: Default behavior. ... '$' matches only at the end or before a newline at the end.

At your regex \s matches a whitespace character, the set [\ \v \f]. In other words it matches the spaces and character. Then $ matches the end of line (no characters, just the position itself). Like word anchor \b matches word boundary, and ^ matches the beginning of the line and not the first character

You could rewrite your regex like this:

/[	 ]+$/

The content of example would look like this if second line didn't end with a character:

£ cat example
I have space after me
I do not£

NOTICE that shell prompt £ is not on next line

The results are different because grep abstracts out line endings like Perl's -l flag. (grep -P ' ' will return no results on a text file where grep -Pz ' ' will.)

line anchor behavior with perl regex

Answers (2)

Related Questions