StackNewb
StackNewb

Reputation: 3

Perl type regex working in regex101.com, but not in command-line

I have a problem with matching text using Perl type regex in command-line, whereas the same regex works as expected in regex101.com. For example, I have a file test.txt with 5 lines that was created with gedit in Ubuntu 22.04.2 LTS and saved with Linux/Unix line endings. The lines in the file are as follows:

One  
Two   
Apple  
Car  
Plane

When I perform pattern matching in regex101.com using '\n' as a search pattern, regex101.com finds 4 matches (ie. 4 lines that contain the matching pattern, which are the first four lines). However, when I perform the same query using command line perl:

perl -ne 'print if /\n/' test.txt

the command outputs all 5 lines as if they contain a match, which, in my opinion should not be happening since the last (5th) line does not contain a newline character.

Moreover, if I search for '\nA', regex101.com correctly labels the newline character in the second line, that is followed by A in the third line.

The following command outputs nothing, as if it did not match what is obviously present in the text and identified by regex101.com

perl -ne 'print if /\nA/' test.txt

Finally, if I search for something non-sensical, such as

perl -ne 'print if /\n$$$$/' test.txt

The output is all five lines of the file test.txt, which obviously is not correct. Can someone shine a light on what I am doing wrong here?

Upvotes: 0

Views: 251

Answers (1)

tobyink
tobyink

Reputation: 13664

There are really three parts to this question:

1. Why is this matching all five lines?

perl -ne 'print if /\n/' test.txt

The answer is that gedit is adding a newline character at the end of the file for you. It does that. Try using a different editor.

SciTE and PulsarEdit are two editors which definitely do allow you to create files without a final new line character. (Though under most circumstances, text files ought to contain a final new line character!)

2. Why is this matching zero lines?

perl -ne 'print if /\nA/' test.txt

Because -n causes Perl to process the input one line at a time. And in each line it processes, the \n is at the end of the line, never before an uppercase A.

3. Why does this match all the lines?

perl -ne 'print if /\n$$$$/' test.txt

Because "\n" appears at the end of each line. Multiple $ characters are allowed because it's a zero-width assertion, so doesn't consume any of the string being matched. That is, the end of the line is matched, but the pointer isn't moved forward, so the end of the line can be matched again.

Upvotes: 4

Related Questions