Reputation: 33
I had a question using Perl's readdir(). I want to gather all the files in a directory that have the same prefix file name I specified. So, for each prefix, I need to use Perl's readdir() to grep all related files.
Suppose the prefix is "abc", there are several files with the names "abc_1", "abc_2", etc.
However, I noticed that if I put opendir, closedir outside of a loop (loop through a list of file name prefixes), I can only grep the very first prefix from the dir -- all the following grepping failed. If I chose to call opendir and closedir each time in the loop, it worked fine but I'm afraid it is not efficient at all.
My question is how can I make it more efficient? It is weird that I can't call readdir multiple times in a loop.
Thanks a lot in advance!
-Jin
Upvotes: 3
Views: 2420
Reputation: 2350
I would code this in a single pass as follows:
while readdir() returns a file name
if the file prefix has not been seen before
record prefix and create directory for this prefix
end if
move (copy?) file to correct directory
end while
For the anally retentive here is some (untested) code that should work. Error handling is left as an exercise for the reader.
require File::Copy;
my $old_base_dir = "original_directory_path";
opendir (my $dir_handle, "$old_base_dir");
my %dir_list;
my $new_base_dir = "new_directory_path";
while (my $file_name = readdir($dir_handle)) {
next if ! -f $file_name; # only move regular files
(my $prefix) = split /_/, $file_name, 1; # assume first _ marks end of prefix
mkdir "$new_base_dir/$prefix" unless exists $dir_list{$prefix};
move("$old_base_dir/$file_name", "$new_base_dir/$file_name"); # assume unix system
}
closedir($dir_handle};
Upvotes: -2
Reputation: 139681
Use the Text::Trie module to group files in one pass through readdir
:
use File::Spec::Functions qw/ catfile /;
use Text::Trie qw/ Trie walkTrie /;
sub group_files {
my($dir,$pattern) = @_;
opendir my $dh, $dir or die "$0: opendir $dir: $!";
my @trie = Trie readdir $dh;
my @groups;
my @prefix;
my $group = [];
my $exitnode = sub {
pop @prefix;
unless (@prefix) {
push @groups => $group if @$group;
$group = [];
}
};
my $leaf = sub {
local $_ = join "" => @prefix;
if (/$pattern/) {
my $full = catfile $dir => "$_$_[0]";
push @$group => $full if -f $full;
}
$exitnode->() unless @prefix;
};
my $node = sub { push @prefix => $_[0] };
@$_[0,1,5] = ($leaf, $node, $exitnode) for \my @callbacks;
walkTrie @callbacks => @trie;
wantarray ? @groups : \@groups;
}
You might use it as in
my($pattern,$dir) = @ARGV;
$pattern //= "^";
$dir //= ".";
my $qr = eval "qr/$pattern/" || die "$0: bad pattern ($pattern)\n";
my @groups = group_files $dir, $qr;
use Data::Dumper;
print Dumper \@groups;
For example:
$ ls abc_1 abc_12 abc_2 abc_3 abc_4 prefixes xy_7 xyz_1 xyz_2 xyz_3 $ ./prefixes $VAR1 = [ [ './prefixes' ], [ './abc_4', './abc_1', './abc_12', './abc_3', './abc_2' ], [ './xy_7', './xyz_1', './xyz_3', './xyz_2' ] ];
Use the optional regular-expression argument as a predicate on prefixes:
$ ./prefixes '^.{3,}' $VAR1 = [ [ './abc_4', './abc_1', './abc_12', './abc_3', './abc_2' ], [ './xyz_1', './xyz_3', './xyz_2' ] ]; $ ./prefixes '^.{2,}' $VAR1 = [ [ './abc_4', './abc_1', './abc_12', './abc_3', './abc_2' ], [ './xy_7', './xyz_1', './xyz_3', './xyz_2' ] ];
Upvotes: -1
Reputation: 30851
Directory (and file) handles are iterators. Reading from one consumes data, you need to either store that data or reset the position of the iterator. Closing and reopening is the hard way; use rewinddir
instead.
Alternately, use glob
to do the reading and filtering in one step.
Upvotes: 8
Reputation: 42747
Why don't you read all the files once and then perform the filtering on that list?
Upvotes: 6