FatDaemon
FatDaemon

Reputation: 806

Omitting or excluding Regular Expression matches from a Perl script

Hi I want to search something in the file which looks similar to this :

Start Cycle
report 1
report 2
report 3
report 4
End Cycle

.... goes on and on..

I want to search for "Start Cycle" and then pull out report 1 and report 3 from it.. My regex looks something like this

(Start Cycle .*\n)(.*\n)(.*\n)(.*\n)

The above regex select Start Cycle and the next three lines.. But i want to omit the thrid line from my result. Is that possible? Or any easier perl script can be done?? I am expecting a result like :

Start Cycle
report 1
report 3

Upvotes: 1

Views: 404

Answers (8)

Mike
Mike

Reputation: 1851

I took the OP's question as a Perl exercise and came up with the following code. It was just written for learning purposes. Kindly correct me if anything looks suspicious.

while(<>) {
   if(/Start Cycle/) {
        push @block,$_;
        push @block, scalar<> for 1..3;               
        print @block[0,1,3];
        @block=(); 
           }
        }

Another version (edited and thanks,@FM):

local $/;
$_ = <>;
  @block = (/(Start Cycle\n)(.+\n).+\n(.+\n)/g);
  print @block;

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342413

while (<>) {
    if (/Start Cycle/) {
        print $_;
        $_ = <>;
        print $_;
        $_ = <>; $_ = <>;
        print $_;
    }
}

Upvotes: 0

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118128

Update: I did not originally notice that this was just @FM's answer in a slightly more robust and longer form.

#!/usr/bin/perl

use strict; use warnings;

{
    local $/ = "End Cycle\n";
    while ( my $block = <DATA> ) {
        last unless my ($heading) = $block =~ /^(Start Cycle\n)/g;
        print $heading, ($block =~ /([^\n]+\n)/g)[1, 3];
    }
}

__DATA__
Start Cycle
report 1
report 2
report 3
report 4
End Cycle

Output:

Start Cycle
report 1
report 3

Upvotes: 0

FMc
FMc

Reputation: 42411

Perhaps a crazy way to do it: alter Perl's understanding of an input record.

$/ = "End Cycle\n";
print( (/(.+\n)/g)[0,1,3] ) while <$file_handle>;

Upvotes: 2

hobbs
hobbs

Reputation: 239930

If you wanted to leave all of the surrounding code the same but stop capturing the third thing, you could simply remove the parens that cause that line to be captured:

(Start Cycle .*\n)(.*\n).*\n(.*\n)

Upvotes: 2

Ether
Ether

Reputation: 53966

The following code prints the odd-numbered lines between Start Cycle and End Cycle:

foreach (<$filehandle>) {
    if (/Start Cycle/ .. /End Cycle/) {
        print if /report (\d+)/ and $1 % 2;
    }
}

Upvotes: 5

mollmerx
mollmerx

Reputation: 648

The regex populates $1, $2, $3 and $4 with the contents of each pair of brackets.

So if you just look at the contents of $1, $2 and $4 you have what you want.

Alternatively you can just leave off the brackets from the third line.

Your regex should look something like

/Start Cycle\n(.+)\n.+\n(.+)\n.+\nEnd Cycle/g

The /g will allow you to evaluate the regex repeatedly and always get the next match every time.

Upvotes: 1

Ivan Nevostruev
Ivan Nevostruev

Reputation: 28723

You can find text between start and end markes then split context by lines. Here is example:

my $text = <<TEXT;
Start Cycle
report 1
report 2
report 3
report 4
End Cycle
TEXT

## find text between all start/end pairs
while ($text =~ m/^Start Cycle$(.*?)^End Cycle$/msg) {
    my $reports_text = $1;
    ## remove leading spaces
    $reports_text =~ s/^\s+//;
    ## split text by newlines
    my @report_parts = split(/\r?\n/m, $reports_text);
}

Upvotes: 2

Related Questions