user2837862
user2837862

Reputation: 51

eliminating text which include new lines

Working with perl and reading a file line by line, I need to eliminate all text included between two specific words (let's say "dog" and "cat"), but I don't know how to do that when there are various lines bewtween both words. Iim tryng to use the "s" modifier, which means the dot (.) can be interpreted as a new line, but it doesn't work:

use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
s/dog.*?cat//s;
print;
}
close F;

Upvotes: 1

Views: 68

Answers (4)

mpapec
mpapec

Reputation: 50657

while (<F>) {
  my $n = s/dog.*//s .. s/.*?cat//;
  $n ||= 0;
  print if $n <= 1 or $n =~ /E/;
}

Upvotes: 1

Miller
Miller

Reputation: 35208

Slurping your file by localizing $/ is going to be your easiest solution. However, if you want to do line by line processing, then you just need to keep track of a $state variable

use strict;
use warnings;
use autodie;

my $filename = shift;
#open my $fh, '<', $filename;

my $state = 0;

while(<DATA>) {
    if ($state == 0 && s/(.*?)dog//) {
        print $1;
        $state = 1;
    }

    if ($state == 1 && s/.*?cat//) {
        $state = 2;
# If you want to handle more than one dog/cat pair, use below code
#       $state = 0;
#       redo;
    }

    if ($state != 1) {
        print;
    }
}

#close $fh;

__DATA__
1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <dog ... yep, it really does
6 stuff to delete
7 this has cat>, cuz cats rock
8 Filler line
9 more <dogs are ignored.
10 more cat>s
11 more filler
12 yet more filler
13 More <dogs and cat>s and stuff
14 more filler
15 more filler
16 more <dogs and cat>s and <dogs and cat>s, see.
17 ending stuff

Outputs

1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <>, cuz cats rock
8 Filler line
9 more <dogs are ignored.
10 more cat>s
11 more filler
12 yet more filler
13 More <dogs and cat>s and stuff
14 more filler
15 more filler
16 more <dogs and cat>s and <dogs and cat>s, see.
17 ending stuff

If you uncomment those two lines so that more than 1 dog/cat pair are filtered, then you get the following:

1 hello world
2 more lines
3 this cat is ignored
4 and yet more
5 this has <>, cuz cats rock
8 Filler line
9 more <>s
11 more filler
12 yet more filler
13 More <>s and stuff
14 more filler
15 more filler
16 more <>s and <>s, see.
17 ending stuff

Upvotes: 0

Dave Hayes
Dave Hayes

Reputation: 56

The answer above is correct. I've just dealt with this issue myself. You can try:

use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $buffer;
{ 
   local $/;
   $buffer = <F>;
   $buffer =~ s/dog.*?cat//s;
}
print $buffer;

Note that this might have side effects you do not want. Consider the input:

dog foo dog bar cat

Do you want the 'foo' included in what is not printed? By default, regular expressions are greedy and will remove the 'foo'...which may or may not be what you want.

The CPAN module Regexp::Common::balanced can help you iron out the correct way you wish to handle these kinds of edge cases.

Upvotes: 0

blueygh2
blueygh2

Reputation: 1538

You are reading in your file line by line, then substituting. If you want the whole text at once, set the input record separator to undef with

local $/;

Then, when you do <F>, you get the whole file content, and the substitution should work.

Upvotes: 1

Related Questions