Reputation: 633
I would like to print everything between lines @cluster t.# has ### elements
(including this line) and @cluster t.#+1 has ### elements
(preferably omitting this line) from my input file into corresponding numbered output files (clust(#).txt
). The script thus far creates the appropriate numbered files, without any content.
#!/usr/bin/perl
use strict;
use warnings;
open(IN,$ARGV[0]);
our $num = 0;
while(my $line = <IN>) {
if ($line =~ /^\@cluster t has (\d+) elements/) {
my $clust = "full";
open (OUT, ">clust$clust.txt");
} elsif ($line =~ m/^\@cluster t.(\d+.*) has (\d+) elements/) {
my $clust = $1;
$num++;
open (OUT, ">clust$clust.txt");
print OUT, $_ if (/$line/ ... /$line/);
}
}
Upvotes: 5
Views: 5636
Reputation: 185800
I have a more concise way to provide :
perl -ne 'print if /^foo/ .. /^base/' file.txt
Lorem ipsum dolor
sit amet,
consectetur adipiscing
foo
bar
base
elit,
sed do
foo
bar
base
Upvotes: 5
Reputation: 66964
Update Re-arranged so that the version based on my final understanding of input comes first. Also edited for clarity.
Detect the line that starts the section to be written to its own file and open the suitable file; otherwise just write to the filehandle (that corresponds to the current output file).
An example input file, in my understanding, data_range.txt
@cluster t.1 has 100 elements data 1 data 1 1 @cluster t.2 has 200 elements data 2 @cluster t.3 has 300 elements
Print t.N
and the lines following up to the next t.N
, to a file clust(N).txt
.
use warnings;
use strict;
my $file = shift || 'data_range.txt';
open my $fh, $file or die "Can't open $file: $!";
my $fh_out;
my $clustline = qr/\@cluster t\.([0-9]+) has [0-9]+ elements/;
while (<$fh>)
{
if (/$clustline/) {
my $outfile = "clust($1).txt";
open $fh_out, '>', $outfile or die "Can't open $outfile: $!";
}
print $fh_out $_;
}
For each line with @cluster
a new file with the corresponding number is opened, closing the previous one since we use the same filehandle. All following lines, including that one, belong to that file and they are printed there.
The code above assumes that the first line in the file is a @cluster
line, and that all lines in this file belong to one of output files. If this may not be so then we need to be more careful: (1) use a flag for when the writing starts and (2) add a branch that allows to skip lines.
my $started_writing = 0;
my $clustline = qr/\@cluster t\.([0-9]+) has [0-9]+ elements/;
while (<$fh>)
{
if (/$clustline/) {
my $fout = "clust($1).txt";
open $fh_out, '>', $fout or die "Can't open $fout for writing: $!";
$started_writing = 1;
}
elsif (not $started_writing) { # didn't get to open output files yet
next;
}
elsif (/dont_write_this_line/) { # condition for lines to skip altogether
next;
}
print $fh_out $_;
}
All of this assumes that a @cluster
line cannot repeat with the same number. You'd lose output data if that happened, so add a test if you aren't sure of your input (or open output files in append mode).
With either we get output clust(1).txt
@cluster t.1 has 100 elements data 1 data 1 1
and clust(2).txt
@cluster t.2 has 200 elements data 2
and clust(3).txt
with the @cluster t.3
line.
Original version, with the initial understanding of input and requirements
The range operator is nearly tailor made for this. It keeps track of its true/false state across repeated calls. It turns true once its left operand evaluates true and stays that way until the right one is true, after which it is false, so on the next evaluation. There is more to it, please see the docs.
Made-up input file data_range.txt
@cluster t.1 has 100 elements @cluster t.2 has 200 elements @cluster t.3 has 300 elements @cluster t.4 has 400 elements @cluster t.5 has 500 elements
Print everything between marker-lines 2 and 4, including the starting line but not the ending one.
use warnings;
use strict;
my $file = 'data_range.txt';
open my $fh, $file or die "Can't open $file: $!";
# Build the start and end patterns
my $beg = qr/^\@cluster t\.2 has 200 elements$/;
my $end = qr/^\@cluster t\.4 has 400 elements$/;
while (<$fh>)
{
if (/$beg/ .. /$end/) {
print if not /$end/;
}
}
This prints lines 2 and 3. The ..
operator turns true once the line ($_
) matches $beg
and is true until a line matches $end
. After that it is false, for the next line. Thus it ends up including both start and end lines as well. So we also test for the end marker, and not print if we have that line.
If you would rather use the literal marker lines you can test strings for equality
my $beg = q(@cluster t.2 has 200 elements);
my $end = q(@cluster t.4 has 400 elements);
while (my $line = <$fh>)
{
chomp($line);
if ($line eq $beg .. $line eq $end) {
print "$line\n" if $line ne $end;
}
}
This works the same way as the example above. Note that now we have to chomp
since the newline would foil eq
test (and then we add \n
for printing).
Upvotes: 7