chrisyyy
chrisyyy

Reputation: 45

Reading and processing files word by word efficiently in perl

I'm very new to perl and wanted to know how I could make this bit here faster. Here is is my current code. Any help is very much appreciated.

#!/usr/bin/perl

use strict;
use warnings;

open( FILE_IN, "<practicecase.txt" ) or die "$!";

open( FILE_OUT, ">extracted.txt" ) or die "$!";

print "Extracting inputs\n";

while (<FILE_IN>) {
    if ( $_ =~ m/^second_word/ ) {
        my @filepath2 = split (/\s+/, $_);
        print FILE_OUT $filepath2[1]."\n";
    }
    if ($_ =~ m/^first_word/ ) {
        my @filepath1 = split (/\s+/, $_);
        print FILE_OUT $filepath1[1]."\n";
    }
}

exit;

My input file, practicecase.txt, is simply:

first_word some/filepath
second_word another/filepath

My output file, extracted.txt, looks like:

some/filepath
another/filepath

Thank you so much!

Upvotes: 0

Views: 54

Answers (2)

Borodin
Borodin

Reputation: 126762

This is about as fast as your algorithm is going to go. The optimisations I have made are to use a single regex pattern to find either first_word or second_word at the beginning of the line, and to use the same pattern to capture the second field in the line

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;
use autodie;

open my $in_fh,  '<', 'practicecase.txt';

open my $out_fh, '>', 'extracted.txt';
select $out_fh;

print "Extracting inputs\n";

while ( <$in_fh> ) {
    print "$1\n" if / ^ (?:first|second)_word \s+ (\S+) /x;
}

Upvotes: 3

Ed Dunn
Ed Dunn

Reputation: 1172

Slight modification on Borodin's code

#!/usr/bin/perl

use strict;
use warnings;
use Cwd qw( abs_path );
use File::Basename;
use File::Basename qw( dirname );

my $file_in = dirname(abs_path($0))."/practicecase.txt";
my $file_out = dirname(abs_path($0))."/extracted.txt";

open my $in, "<$file_in" or die "$!";
open my $out, ">$file_out" or die "$!";

print "Extracting inputs\n";

while ( <$in> ) {
    print $out "$1\n" if / ^ (?:first|second)_word\s+(.+?)$ /x;
}

close($in);
close($out);
exit;

Basically changed the regex to capture everything to the end of the line in case your original text file has spaces or other whitespace you may want to include. Also don't forget to close your file handles

Upvotes: -1

Related Questions