Reputation: 73
currently I have the following script
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
the script splits a file into other files with the pattern file 1 file 2 ... now I would need to pass to the script another parameter which allows to specify a prefix for the output so if this additional input is 1
then the output would be
1_file1
,1_file2
....and so on.. how could I do that?
I know that something like
use Getopt::Long;
could be used?
tried this
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my ( $suffix, $filename ) = @ARGV;
open ( my $input, "<", $filename ) or die $!;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<$input>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
but that is not working. What is wrong?
I get
No such file or directory at ./spl.pl line 10, <> line 2.
after the header is printed.
Upvotes: 0
Views: 90
Reputation: 26121
I would do something like this.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Getopt::Long qw(:config bundling);
use Pod::Usage;
{
my $man = 0;
my $help = 0;
my $verbose = 0;
my $prefix = '';
my $suffix = '';
my $header_lines = 2;
my $bunch_size = 1000;
GetOptions(
'help|?' => \$help,
'man' => \$man,
'verbose|v+' => \$verbose,
'prefix|p=s' => \$prefix,
'suffix|s=s' => \$suffix,
'header|h=i' => \$header_lines,
'bunch|batch|bucket|b=i' => \$bunch_size
) or pod2usage(2);
pod2usage(1) if $help;
pod2usage( -exitval => 0, -verbose => 2 ) if $man;
pod2usage(
-exitval => 3,
-message => "Headers lines can't be negative number"
) if $header_lines < 0;
pod2usage(
-exitval => 4,
-message => "Bunch size has to be positive"
) unless $bunch_size > 0;
my $header = '';
$header .= <> for 1 .. $header_lines;
my %seen;
my $current_output_number = -1;
sub key2output { int( shift() / $bunch_size ) }
sub set_output {
my $output_number = shift;
if ( $output_number != $current_output_number ) {
my $seen = $seen{$output_number}++;
printf STDOUT "Opening %sfile for %d\n", $seen ? '' : 'new ',
$output_number
if $verbose;
open my $fh, $seen ? '>>' : '>',
$prefix . $output_number . $suffix;
select $fh;
print $header unless $seen;
$current_output_number = $output_number;
}
}
}
while (<>) {
my ($key) = /^(\d+)\s/;
next unless defined $key;
set_output( key2output($key) );
print;
}
__END__
=head1 NAME
code.pl - splits file by first number by thousands
=head1 SYNOPSIS
code.pl [options] [file ...]
Options:
--help brief help message
--man full documentation
--prefix output filename prefix
--suffix outpit filename suffix
--header number of header lines (default: 2)
=head1 OPTIONS
=over 8
=item B<--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
Just finish documentation and you can ship it to your colleagues.
Upvotes: 1
Reputation: 69224
As Sobrique says, your problem is the magical nature of <>
. But I don't think that it's as hard to deal with as he thinks.
The point is that <>
looks at the current value of @ARGV
. So you can add other command line arguments as long as you ensure that you have removed them from @ARGV
before you use <>
for the first time.
So change your code so that it starts like this:
my %seen;
my $prefix = shift;
my $header = <> . <>;
You can then call your program like this:
$ your_program.pl prefix_goes_here list of file names...
Everything else should now work the same as it currently does, but you have your prefix stored away in $prefix
so that you can use it in your print
statements.
I hope that's what you wanted. Your question isn't particularly clear.
Upvotes: 2
Reputation: 53478
The problem you've got is that the diamond operator <>
is a piece of special perl magic.
It takes 'all filenames on command line' opens them and processes them in order.
To do what you're trying to do:
my ( $suffix, $filename ) = @ARGV;
open ( my $input, "<", $filename ) or die $!;
Then you can change your while loop to:
while ( <$input> ) {
And modify the output filename according to your desires. The key different there is that it'll only take one filename at that point - first arg is suffix, second is name.
You could perhaps extend this with:
my ( $suffix, @names ) = @ARGV;
And then run a foreach loop:
foreach my $filename ( @names ) {
open .... #etc
Upvotes: 0