zeroparallax
zeroparallax

Reputation: 53

perl regex not matching string with newline character \n

I'm trying to use perl (v5.14.2) via a bash shell (GNU Bash-4.2) in Kubuntu (GNU/Linux) to search and replace a string that includes a newline character, but I'm not succeeding yet.

Here's the text file I'm searching:

<!-- filename: prac1.html -->

hello
kitty

blah blah blah

When I use a text editor's (Kate's) search-and-replace functionality or when I use a regex tester (http://regexpal.com/), I can easily get this regex to work:

hello\nkitty

But when using perl in the command line, none of the following commands have worked:

perl -p -i -e 's,hello\nkitty,newtext,' prac1.html
perl -p -i -e 's,hello.kitty,newtext,s' prac1.html
perl -p -i -e 's,hello.*kitty,newtext,s' prac1.html
perl -p -i -e 's,hello[\S\s]kitty,newtext,' prac1.html
perl -p -i -e 's,hello[\S\s]*kitty,newtext,' prac1.html

Actually, I got desperate and tried many other patterns, including all of these (different permutations in the "single-line" and "multi-line" modes):

perl -p -i -e 's,hello\nkitty,newtext,' prac1.html
perl -p -i -e 's,hello.kitty,newtext,' prac1.html
perl -p -i -e 's,hello\nkitty,newtext,s' prac1.html
perl -p -i -e 's,hello.kitty,newtext,s' prac1.html
perl -p -i -e 's,hello\nkitty,newtext,m' prac1.html
perl -p -i -e 's,hello.kitty,newtext,m' prac1.html
perl -p -i -e 's,hello\nkitty,newtext,ms' prac1.html
perl -p -i -e 's,hello.kitty,newtext,ms' prac1.html

perl -p -i -e 's,hello[\S\s]kitty,newtext,' prac1.html
perl -p -i -e 's,hello[\S\s]*kitty,newtext,' prac1.html
perl -p -i -e 's,hello$[\S\s]^kitty,newtext,' prac1.html
perl -p -i -e 's,hello$[\S\s]*^kitty,newtext,' prac1.html
perl -p -i -e 's,hello[\S\s]kitty,newtext,s' prac1.html
perl -p -i -e 's,hello[\S\s]*kitty,newtext,s' prac1.html
perl -p -i -e 's,hello$[\S\s]^kitty,newtext,s' prac1.html
perl -p -i -e 's,hello$[\S\s]*^kitty,newtext,s' prac1.html
perl -p -i -e 's,hello[\S\s]kitty,newtext,m' prac1.html
perl -p -i -e 's,hello[\S\s]*kitty,newtext,m' prac1.html
perl -p -i -e 's,hello$[\S\s]^kitty,newtext,m' prac1.html
perl -p -i -e 's,hello$[\S\s]*^kitty,newtext,m' prac1.html
perl -p -i -e 's,hello[\S\s]kitty,newtext,ms' prac1.html
perl -p -i -e 's,hello[\S\s]*kitty,newtext,ms' prac1.html
perl -p -i -e 's,hello$[\S\s]^kitty,newtext,ms' prac1.html
perl -p -i -e 's,hello$[\S\s]*^kitty,newtext,ms' prac1.html

(I also tried using \r \r\n \R \f \D etc., and global mode as well.)

Can anyone spot the issue or suggest a solution?

Upvotes: 5

Views: 3225

Answers (2)

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185219

Try doing this, I make this possible by modifying the input record separator (a newline by default) :

perl -i -p00e 's,hello\nkitty,newtext,' prac1.html

from perldoc perlrun :

-0[octal/hexadecimal]

specifies the input record separator ($/ ) as an octal or hexadecimal number. If there are no digits, the null character is the separator. Other switches may precede or follow the digits. For example, if you have a version of find which can print filenames terminated by the null character, you can say this:

find . -name '*.orig' -print0 | perl -n0e unlink

The special value 00 will cause Perl to slurp files in paragraph mode. Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose.

Upvotes: 14

Bluby
Bluby

Reputation: 331

The problem is that "-p" has already implicitly wrapped this loop around your "-e", and the "<>" is splitting the input into lines, so your regexp never gets a chance to see more than one line.

 LINE:
       while (<>) {
           ...             # your program goes here
       } continue {
           print or die "-p destination: $!\n";
       }

See the perlrun manpage for more information.

Upvotes: 6

Related Questions