Reputation: 2981
I recently wrote a little Perl script to trim whitespace from the end of lines and ran into unexpected behavior. I decided that Perl must include line-end characters when breaking up lines, so tested that theory and got even more unexpected behavior. I do not
should either match \s+$
or t$
...Not both. Very confused. Can anyone enlighten me?
£ cat example
I have space after me
I do not
£ perl -ne 'print if /\s+$/' example
I have a space after me
I do not
£ perl -ne 'print if /t$/' example
I do not
£
PCRE tester gives expected results. I've also tried the /m
suffix with no change in behavior.
edit. for completeness:
£ perl -ne 'print if /e$/' example
£
Expected behavior from perl -ne 'print if...'
was the same as grep -P
:
£ grep -P '\s+$' example
I have a space after me
£
Can repro under Ubuntu 16.04 perl v5.22.1 (both 60 and 68 patch version) and MINGW perl v5.26.1.
Upvotes: 1
Views: 283
Reputation: 25263
You see your current behavior because in example
file the second line has \n
character at the end. \n
is the space which matched by \s
no modifiers: Default behavior. ... '$' matches only at the end or before a newline at the end.
At your regex \s matches a whitespace character, the set [\ \t\v\r\n\f]
. In other words it matches the spaces and \n
character. Then $
matches the end of line (no characters, just the position itself). Like word anchor \b
matches word boundary, and ^
matches the beginning of the line and not the first character
You could rewrite your regex like this:
/[\t ]+$/
The content of example
would look like this if second line didn't end with a \n
character:
£ cat example
I have space after me
I do not£
NOTICE that shell prompt £
is not on next line
The results are different because grep
abstracts out line endings like Perl's -l
flag. (grep -P '\n'
will return no results on a text file where grep -Pz '\n'
will.)
Upvotes: 5
Reputation: 18980
Your problems stem from the -n
option and the use of \s
. The -n
flag feeds the input to Perl line by line into $_
, then it calls the print if match
statement.
In your match you use the $
anchor to match the end of the line. The anchor is purely positional and does not consume the newline or any other character.
Check it yourself here with \s+
: Whether your add a $
or not, the regex matches the same number of characters.
This is because \s
is equal to [\r\n\t\f\v ]
and matches any whitespace character and you have added the +
quantifier. So, it matches between one and unlimited times, as many times as possible (greedy).
If you searched just for trailing space characters instead you are good: [ ]+$
(here escaped with a group):
£ perl -ne 'print if /[ ]+$/' example
That way it does not match the \n
like \s
does. Try it yourself here.
Bonus:
Here are some common Perl one-liners to trim spaces:
# Strip leading whitespace (spaces, tabs) from the beginning of each line
perl -ple 's/^[ \t]+//'
perl -ple 's/^\s+//'
# Strip trailing whitespace (space, tabs) from the end of each line
perl -ple 's/[ \t]+$//'
# Strip whitespace from the beginning and end of each line
perl -ple 's/^[ \t]+|[ \t]+$//g'
Upvotes: 2