jdamae
jdamae

Reputation: 3909

perl - optimal way to process many similarly named text files

I have several thousand text files in a directory i need to process. Similarly named, but with some variation:

/home/dir/abc123.name.efg-joe_p000.20110124.csv
/home/dir/abc456.name.efg-jon_p000.20110124.csv
/home/dir/abc789.name.efg-bob_p000.20110124.csv

I have a perl script that can process one file at a time without a problem:

./script.pl /home/dir/abc123.name.efg-joe_p000.20110124.csv

What's the best way to pass in and process many of these files, one a time? Am I looking at ARGV for this? Should I list the files in a separate file and then use that as input?

Upvotes: 1

Views: 375

Answers (3)

Eugene Yarmash
Eugene Yarmash

Reputation: 149776

You can use readdir to read the filenames one at a time:

opendir my $dh, $some_dir or die "can't opendir $some_dir: $!";

while (defined(my $file = readdir($dh))) {
    next if $file =~ /^\./;
    print $file;
}

Upvotes: 2

DVK
DVK

Reputation: 129403

You can pass a file pattern, as a parameter (glob format) and then pass that to glob call to list the files; then process them in a loop one by one.

./script.pl -file_pattern "/home/dir/abc123.name.efg-joe_p000.*.csv"

In your script

my @files = glob($file_pattern);

Upvotes: 4

pilcrow
pilcrow

Reputation: 58534

If by "optimal" you mean "no code changes," and you are, as your pathnames suggest, on a *NIX-like system, try this:

$ find /home/dir -type f -name \*.csv -exec ./script.pl {} \;

If script.pl can handle multiple filename arguments, you might parallelize, say, 10 at a time:

$ find /home/dir -type f -name \*.csv | xargs -n 10 ./script.pl

Upvotes: 4

Related Questions