How to print lines in between two patterns?

Question

I would like to print everything between lines @cluster t.# has ### elements (including this line) and @cluster t.#+1 has ### elements (preferably omitting this line) from my input file into corresponding numbered output files (clust(#).txt). The script thus far creates the appropriate numbered files, without any content.

#!/usr/bin/perl 

use strict;
use warnings;

open(IN,$ARGV[0]);

our $num = 0;

while(my $line = ) {
    if ($line =~ /^\@cluster t has (\d+) elements/) {
        my $clust = "full";
        open (OUT, ">clust$clust.txt");

    } elsif ($line =~ m/^\@cluster t.(\d+.*) has (\d+) elements/) {
        my $clust = $1;
        $num++;
        open (OUT, ">clust$clust.txt");
        print OUT, $_ if (/$line/ ... /$line/);
    }
}

zdim · Accepted Answer

Update Re-arranged so that the version based on my final understanding of input comes first. Also edited for clarity.

Detect the line that starts the section to be written to its own file and open the suitable file; otherwise just write to the filehandle (that corresponds to the current output file).

An example input file, in my understanding, data_range.txt

@cluster t.1 has 100 elements
data 1
data 1 1
@cluster t.2 has 200 elements
data 2
@cluster t.3 has 300 elements

Print t.N and the lines following up to the next t.N, to a file clust(N).txt.

use warnings;
use strict;

my $file = shift || 'data_range.txt';
open my $fh, $file  or die "Can't open $file: $!";

my $fh_out;

my $clustline = qr/\@cluster t\.([0-9]+) has [0-9]+ elements/;

while (<$fh>) 
{
    if (/$clustline/) {
        my $outfile = "clust($1).txt";
        open $fh_out, '>', $outfile or die "Can't open $outfile: $!";
    }

    print $fh_out $_;
}

For each line with @cluster a new file with the corresponding number is opened, closing the previous one since we use the same filehandle. All following lines, including that one, belong to that file and they are printed there.

The code above assumes that the first line in the file is a @cluster line, and that all lines in this file belong to one of output files. If this may not be so then we need to be more careful: (1) use a flag for when the writing starts and (2) add a branch that allows to skip lines.

my $started_writing = 0;

my $clustline = qr/\@cluster t\.([0-9]+) has [0-9]+ elements/;

while (<$fh>) 
{
    if (/$clustline/) {
        my $fout = "clust($1).txt";
        open $fh_out, '>', $fout or die "Can't open $fout for writing: $!";
        $started_writing = 1;
    }
    elsif (not $started_writing) {   # didn't get to open output files yet
        next;
    }
    elsif (/dont_write_this_line/) { # condition for lines to skip altogether
        next;
    }

    print $fh_out $_;
}

All of this assumes that a @cluster line cannot repeat with the same number. You'd lose output data if that happened, so add a test if you aren't sure of your input (or open output files in append mode).

With either we get output clust(1).txt

@cluster t.1 has 100 elements
data 1
data 1 1

and clust(2).txt

@cluster t.2 has 200 elements
data 2

and clust(3).txt with the @cluster t.3 line.

Original version, with the initial understanding of input and requirements

The range operator is nearly tailor made for this. It keeps track of its true/false state across repeated calls. It turns true once its left operand evaluates true and stays that way until the right one is true, after which it is false, so on the next evaluation. There is more to it, please see the docs.

Made-up input file data_range.txt

@cluster t.1 has 100 elements
@cluster t.2 has 200 elements
@cluster t.3 has 300 elements
@cluster t.4 has 400 elements
@cluster t.5 has 500 elements

Print everything between marker-lines 2 and 4, including the starting line but not the ending one.

use warnings;
use strict;

my $file = 'data_range.txt';
open my $fh, $file  or die "Can't open $file: $!";

# Build the start and end patterns
my $beg = qr/^\@cluster t\.2 has 200 elements$/;
my $end = qr/^\@cluster t\.4 has 400 elements$/;

while (<$fh>) 
{
    if (/$beg/ .. /$end/) {
        print if not /$end/;
    }   
}

This prints lines 2 and 3. The .. operator turns true once the line ($_) matches $beg and is true until a line matches $end. After that it is false, for the next line. Thus it ends up including both start and end lines as well. So we also test for the end marker, and not print if we have that line.

If you would rather use the literal marker lines you can test strings for equality

my $beg = q(@cluster t.2 has 200 elements);
my $end = q(@cluster t.4 has 400 elements);

while (my $line = <$fh>) 
{
    chomp($line);
    if ($line eq $beg .. $line eq $end) {
        print "$line
" if $line ne $end;
    }   
}

This works the same way as the example above. Note that now we have to chomp since the newline would foil eq test (and then we add for printing).

How to print lines in between two patterns?

Answers (2)

Sample input

Output

Related Questions