ktoui
ktoui

Reputation: 73

modification of script in perl

currently I have the following script

#!/usr/bin/env perl
use strict;
use warnings;

my %seen;

my $header = <> . <>;
print $header;

my $last_sequence_number = 0;

open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;

while (<>) {
    my ($key) = split;
    next unless $key =~ m/^\d+$/;
    my $sequence_number = int( $key / 1000 );
    if ( not $sequence_number == $last_sequence_number ) {
        print "Opening new file for $sequence_number\n";
        close($output);
        open( $output, ">", "output.$sequence_number.out" ) or die $!;
        print {$output} $header unless $seen{$sequence_number}++;
        $last_sequence_number = $sequence_number;
    }
    print {$output} $_;
}

the script splits a file into other files with the pattern file 1 file 2 ... now I would need to pass to the script another parameter which allows to specify a prefix for the output so if this additional input is 1 then the output would be

1_file1,1_file2....and so on.. how could I do that?

I know that something like

use Getopt::Long;

could be used?

tried this

#!/usr/bin/env perl
use strict;
use warnings;

my %seen;

my $header = <> . <>;
print $header;
my ( $suffix, $filename ) = @ARGV;
open ( my $input, "<", $filename ) or die $!;                                   
my $last_sequence_number = 0;

open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;

while (<$input>) {
    my ($key) = split;
    next unless $key =~ m/^\d+$/;
    my $sequence_number = int( $key / 1000 );
    if ( not $sequence_number == $last_sequence_number ) {
        print "Opening new file for $sequence_number\n";
        close($output);
        open( $output, ">", "output.$sequence_number.out" ) or die $!;
        print {$output} $header unless $seen{$sequence_number}++;
        $last_sequence_number = $sequence_number;
    }
    print {$output} $_;
}

but that is not working. What is wrong?

I get

 No such file or directory at ./spl.pl line 10, <> line 2.

after the header is printed.

Upvotes: 0

Views: 90

Answers (3)

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

I would do something like this.

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Getopt::Long qw(:config bundling);
use Pod::Usage;

{
    my $man          = 0;
    my $help         = 0;
    my $verbose      = 0;
    my $prefix       = '';
    my $suffix       = '';
    my $header_lines = 2;
    my $bunch_size   = 1000;

    GetOptions(
        'help|?'                 => \$help,
        'man'                    => \$man,
        'verbose|v+'             => \$verbose,
        'prefix|p=s'             => \$prefix,
        'suffix|s=s'             => \$suffix,
        'header|h=i'             => \$header_lines,
        'bunch|batch|bucket|b=i' => \$bunch_size
    ) or pod2usage(2);
    pod2usage(1) if $help;
    pod2usage( -exitval => 0, -verbose => 2 ) if $man;
    pod2usage(
        -exitval => 3,
        -message => "Headers lines can't be negative number"
    ) if $header_lines < 0;
    pod2usage(
        -exitval => 4,
        -message => "Bunch size has to be positive"
    ) unless $bunch_size > 0;

    my $header = '';
    $header .= <> for 1 .. $header_lines;

    my %seen;
    my $current_output_number = -1;

    sub key2output { int( shift() / $bunch_size ) }

    sub set_output {
        my $output_number = shift;
        if ( $output_number != $current_output_number ) {
            my $seen = $seen{$output_number}++;
            printf STDOUT "Opening %sfile for %d\n", $seen ? '' : 'new ',
                $output_number
                if $verbose;
            open my $fh, $seen ? '>>' : '>',
                $prefix . $output_number . $suffix;
            select $fh;
            print $header unless $seen;
            $current_output_number = $output_number;
        }
    }
}

while (<>) {
    my ($key) = /^(\d+)\s/;
    next unless defined $key;
    set_output( key2output($key) );
    print;
}

__END__

=head1 NAME

code.pl - splits file by first number by thousands

=head1 SYNOPSIS

code.pl [options] [file ...]

 Options:
   --help            brief help message
   --man             full documentation
   --prefix          output filename prefix
   --suffix          outpit filename suffix
   --header          number of header lines (default: 2)

=head1 OPTIONS

=over 8

=item B<--help>

Print a brief help message and exits.

=item B<--man>

Prints the manual page and exits.

=back

=head1 DESCRIPTION

B<This program> will read the given input file(s) and do something
useful with the contents thereof.

=cut

Just finish documentation and you can ship it to your colleagues.

Upvotes: 1

Dave Cross
Dave Cross

Reputation: 69224

As Sobrique says, your problem is the magical nature of <>. But I don't think that it's as hard to deal with as he thinks.

The point is that <> looks at the current value of @ARGV. So you can add other command line arguments as long as you ensure that you have removed them from @ARGV before you use <> for the first time.

So change your code so that it starts like this:

my %seen;

my $prefix = shift;

my $header = <> . <>;

You can then call your program like this:

$ your_program.pl prefix_goes_here list of file names...

Everything else should now work the same as it currently does, but you have your prefix stored away in $prefix so that you can use it in your print statements.

I hope that's what you wanted. Your question isn't particularly clear.

Upvotes: 2

Sobrique
Sobrique

Reputation: 53478

The problem you've got is that the diamond operator <> is a piece of special perl magic.

It takes 'all filenames on command line' opens them and processes them in order.

To do what you're trying to do:

my ( $suffix, $filename ) = @ARGV;
open ( my $input, "<", $filename ) or die $!; 

Then you can change your while loop to:

while ( <$input> ) {

And modify the output filename according to your desires. The key different there is that it'll only take one filename at that point - first arg is suffix, second is name.

You could perhaps extend this with:

my ( $suffix, @names ) = @ARGV;

And then run a foreach loop:

foreach my $filename ( @names ) { 
    open .... #etc

Upvotes: 0

Related Questions