PLT
PLT

Reputation: 21

Substitution Method - How to find two tags with a newline inbetween using Perl?

I have searched for a newline inbetween two lines in a html coding... I have tried out substitution method in perl... Following is the coding i have tried...

File.txt

<body>
<p>God is great</p>
<p>He lives everywhere</p>
.......
</body>

Output:

file.html

<body>
<p>God is great He lives everywhere</p>
.......
</body>

Coding: I have searched using substitution to merge the two lines...

print "Enter the filename: ";
chomp($file=<STDIN>);
open my $in,  '<', "$file.txt" or die "Can't read old file: $!";
open my $out, '>', "$file.html" or die "Can't write new file: $!";
while( <$in> )
{
s/(.+)<\/p>\n<p>/$1 /gs;
print $out $_;
}
close $in;
close $out;

But this not working How can update this????

Upvotes: 2

Views: 67

Answers (3)

clt60
clt60

Reputation: 63922

The next script:

use 5.010;
use warnings;

my $html = do { local $/; <DATA> };
$html =~ s:</p>\n<p>: :igs;
say $html;

__DATA__
<body>

<p>par1</p><p>par2 </p>
<p> par3</p><p>par4</p>

<p> par5 </p>

<p>par6</p>

</body>

produces:

<body>

<p>par1</p><p>par2   par3</p><p>par4</p>

<p> par5 </p>

<p>par6</p>

</body>

if you change the regex to:

$html =~ s:</p>\s*<p>: :igs;

will get

<body>

<p>par1 par2   par3 par4  par5  par6</p>

</body>

and so on.

The main points:

  • slurp the whole file into a variable
  • substitute, with:
    • i - ignore case to match <p> and <P>
    • g - every occurence in the string, and
    • s - treat the string as an single line
  • substitute into one space, because if not, you will get concatenated stings, like from
$html =~ s:</p>\s*<p>::igs;

this

<body>

<p>par1par2  par3par4 par5 par6</p>

</body>

Note for example the par1 and par2.

For slurping you should change the <DATA> to your <$filehandle>.

Upvotes: 1

RobEarl
RobEarl

Reputation: 7912

You're reading the file one line at a time so there won't ever be a \n to match the substitution. Instead of using a while loop, you can read the file in one go:

my $html = do { local $/; <$in> };

Then do the substitution:

$html =~ s#</p>\n<p># #g;
print $out $html;

Notice I'm using an alternative delimiter for the substitution to avoid having to escape the /.

Upvotes: 4

Jens
Jens

Reputation: 69450

Add an s to your regex option and it should work:

s/(.+)</\p>\n<p>/$1 /gs;
                      ^^

Upvotes: 1

Related Questions