Reputation: 33
I have a large txt file made of thousand of articles and I am trying to split it into individual files - one for each of the articles that I'd like to save as article_1, article_2 etc.. Each articles begins by a line containing the word /DOCUMENTS/. I am totally new to perl and any insight would be so great ! (even advice on good doc websites). Thanks a lot. So far what I have tried look like:
#!/usr/bin/perl
use warnings;
use strict;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open IN, $source or die "can t read $source: $!\n";
while (<IN>)
{
{
open OUT, ">$destination" or die "can t write $destination: $!\n";
if (/DOCUMENTS/)
{
close OUT ;
$id++;
}
}
}
close IN;
Upvotes: 3
Views: 3537
Reputation: 29854
Let's say that /DOCUMENTS/
appears by itself on a line. Thus you can make that the record separator.
use English qw<$RS>;
use File::Slurp qw<write_file>;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
{ local $RS = "\n/DOCUMENTS/\n";
open my $in, $source or die "can t read $source: $!\n";
while ( <$in> ) {
chomp; # removes the line "\n/DOCUMENTS/\n"
write_file( 'file' . ( ++$id ) . '.txt', $_ );
}
# being scoped by the surrounding brackets (my "local block"),
close $in; # an explicit close is not necessary
}
NOTES:
use English
declares the global variable $RS
. The "messy name" for it is $/
. See perldoc perlvar
'/DOCUMENTS/'
all by itself on a line, I specified newline + '/DOCUMENTS/' + newline
. If this is part of a path that occurs somewhere on the line, then that particular value will not work for the record separator.Upvotes: 4
Reputation: 3682
Did you read Programming Perl? It is the best book for beginning!
I don't understand what you are trying to do. I assume you have text that has articles and want to get all articles in separate files.
use warnings;
use strict;
use autodie qw(:all);
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open my $IN, '<', $source;
#open first file
open my $OUT, '>', $destination;
while (<$IN>) {
chomp; # kill \n at the end
if ($_ eq '/DOCUMENTS/') { # not sure, am i right here or what you looking for
close OUT;
$id++;
$destination = "file$id.txt";
open my $OUT, '>', $destination;
} else {
print {$OUT} $_, "\n"; # print into file with $id name (as you open above)
}
}
close $IN;
Upvotes: 2