Jin
Jin

Reputation: 33

How can I use Perl's readdir multiple times efficiently on the same directory?

I had a question using Perl's readdir(). I want to gather all the files in a directory that have the same prefix file name I specified. So, for each prefix, I need to use Perl's readdir() to grep all related files.

Suppose the prefix is "abc", there are several files with the names "abc_1", "abc_2", etc.

However, I noticed that if I put opendir, closedir outside of a loop (loop through a list of file name prefixes), I can only grep the very first prefix from the dir -- all the following grepping failed. If I chose to call opendir and closedir each time in the loop, it worked fine but I'm afraid it is not efficient at all.

My question is how can I make it more efficient? It is weird that I can't call readdir multiple times in a loop.

Thanks a lot in advance!

-Jin

Upvotes: 3

Views: 2420

Answers (6)

David Harris
David Harris

Reputation: 2350

I would code this in a single pass as follows:

while readdir() returns a file name
    if the file prefix has not been seen before
        record prefix and create directory for this prefix
    end if
    move (copy?) file to correct directory
end while

For the anally retentive here is some (untested) code that should work. Error handling is left as an exercise for the reader.

require File::Copy;

my $old_base_dir = "original_directory_path";
opendir (my $dir_handle, "$old_base_dir");

my %dir_list;
my $new_base_dir = "new_directory_path";

while (my $file_name = readdir($dir_handle)) {
    next if ! -f $file_name;   # only move regular files
    (my $prefix) = split /_/, $file_name, 1; # assume first _ marks end of prefix

    mkdir "$new_base_dir/$prefix" unless exists $dir_list{$prefix};

    move("$old_base_dir/$file_name", "$new_base_dir/$file_name"); # assume unix system
}

closedir($dir_handle};

Upvotes: -2

Greg Bacon
Greg Bacon

Reputation: 139681

Use the Text::Trie module to group files in one pass through readdir:

use File::Spec::Functions qw/ catfile /;
use Text::Trie qw/ Trie walkTrie /;

sub group_files {
  my($dir,$pattern) = @_;

  opendir my $dh, $dir or die "$0: opendir $dir: $!";

  my @trie = Trie readdir $dh;

  my @groups;
  my @prefix;
  my $group = [];

  my $exitnode = sub {
    pop @prefix;
    unless (@prefix) {
      push @groups => $group if @$group;
      $group = [];
    }
  };

  my $leaf = sub {
    local $_ = join "" => @prefix;
    if (/$pattern/) {
      my $full = catfile $dir => "$_$_[0]";
      push @$group => $full if -f $full;
    }
    $exitnode->() unless @prefix;
  };

  my $node = sub { push @prefix => $_[0] };

  @$_[0,1,5] = ($leaf, $node, $exitnode) for \my @callbacks;
  walkTrie @callbacks => @trie;

  wantarray ? @groups : \@groups;
}

You might use it as in

my($pattern,$dir) = @ARGV;

$pattern //= "^";
$dir     //= ".";

my $qr = eval "qr/$pattern/" || die "$0: bad pattern ($pattern)\n";
my @groups = group_files $dir, $qr;

use Data::Dumper;
print Dumper \@groups;

For example:

$ ls
abc_1  abc_12  abc_2  abc_3  abc_4  prefixes  xy_7  xyz_1  xyz_2  xyz_3

$ ./prefixes
$VAR1 = [
          [
            './prefixes'
          ],
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

Use the optional regular-expression argument as a predicate on prefixes:

$ ./prefixes '^.{3,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

$ ./prefixes '^.{2,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

Upvotes: -1

Zano
Zano

Reputation: 2761

Why dontcha just let @files = <abc_*>?

Upvotes: 1

Penfold
Penfold

Reputation: 2568

Would rewinddir() be of assistance at this juncture?

Upvotes: 1

Michael Carman
Michael Carman

Reputation: 30851

Directory (and file) handles are iterators. Reading from one consumes data, you need to either store that data or reset the position of the iterator. Closing and reopening is the hard way; use rewinddir instead.

Alternately, use glob to do the reading and filtering in one step.

Upvotes: 8

jamessan
jamessan

Reputation: 42747

Why don't you read all the files once and then perform the filtering on that list?

Upvotes: 6

Related Questions