AnkP
AnkP

Reputation: 651

Perl - concatenate files with similar names pattern and write concatenated file names to a list

I have a directory with multiple sub-directories in it and each subdir has a fixed set of files - one for each category like -

1)Main_dir
1.1) Subdir1 with files 

 - Test.1.age.txt
 - Test.1.name.txt
 - Test.1.place.csv
..........
1.2) Subdir2 with files 
 - Test.2.age.txt
 - Test.2.name.txt
 - Test.2.place.csv
.........

there are around 20 folders with 10 files in them. I need to first concatenate files under each category like Test.1.age.txt and Test.2.age.txt into a combined.age.txt file and once I do all concatenation I want to printout these filenames in a new Final_list.txt file like

./Main_dir/Combined.age.txt
./Main_dir/Combined.name.txt

I am able to read all the files from all subdirs in an array, but i am not sure how to do pattern search for the similar files names. Also, will be able to figure out this printout part of the code. Can anyone please share on how to do this pattern search for concatenation? My code so far :

use warnings;
use strict;
use File::Spec;
use Data::Dumper;
use File::Basename;

foreach my $file (@files) {
print "$file\n";
}
my $testdir = './Main_dir';
my @Comp_list = glob("$testdir/test_dir*/*.txt");

I am trying to do the pattern search on the array contents in the @Comp_list, which I surely need to learn -

foreach my $f1 (@Comp_list) {
        if($f1 !~ /^(\./\.txt$/) {
        print $f1; # check if reading the file right 


 #push it to a file using concatfile( 
}}

Thanks a lot!

Upvotes: 1

Views: 731

Answers (2)

Borodin
Borodin

Reputation: 126722

This should work for you. I've only tested it superficially as it would take me a while to create some test data, so as you have some at hand I'm hoping you'll report back with any problems

The program segregates all the files found by the equivalent of your glob call, and puts them in buckets according to their type. I've assumed that the names are exactly as you've shown, so the type is penultimate field when the file name is split on dots; i.e. the type of Test.1.age.txt is age

Having collected all of the file lists, I've used a technique that is originally designed to read through all of the files specified on the command line. If @ARGV is set to a list of files then an <ARGV> operation will read through all the files as if they were one, and so can easily be copied to a new output file

If you need the files concatenated in a specific order then I will have to amend my solution. At present they will be processed in the order that glob returns them -- probably in lexical order of their file names, but you shouldn't rely on that

use strict;
use warnings 'all';
use v5.14.0;    # For autoflush method

use File::Spec::Functions 'catfile';

use constant ROOT_DIR => './Main_dir';

my %files;

my $pattern = catfile(ROOT_DIR, 'test_dir*', '*.txt');

for my $file ( glob $pattern ) {
    my @fields = split /\./, $file;
    my $type = lc $fields[-2];
    push @{ $files{$type} }, $file;
}

STDOUT->autoflush;    # Get prompt reports of progress

for my $type ( keys %files ) {

    my $outfile = catfile(ROOT_DIR, "Combined.$type.txt");
    open my $out_fh, '>', $outfile or die qq{Unable to open "$outfile" for output: $!};

    my $files = $files{$type};

    printf qq{Writing aggregate file "%s" from %d input file%s ... },
            $outfile,
            scalar @$files,
            @$files == 1 ? '' : 's';

    local @ARGV = @$files;
    print $out_fh $_ while <ARGV>;

    print "complete\n";
}

Upvotes: 3

rcedillo
rcedillo

Reputation: 356

I think it's easier if you categorize the files first then you can work with them.

use warnings;
use strict;

use File::Spec;
use Data::Dumper;
use File::Basename;

my %hash = ();

my $testdir = './main_dir';
my @comp_list = glob("$testdir/**/*.txt");

foreach my $file (@comp_list){
    $file =~ /(\w+\.\d\..+\.txt)/;
    next if not defined $1;
    my @tmp = split(/\./, $1);
    if (not defined $hash{$tmp[-2]}) {
        $hash{$tmp[-2]} = [$file];
    }else{
        push($hash{$tmp[-2]}, $file);
    }
}

print Dumper(\%hash);

Files:

main_dir
├── sub1
│   ├── File.1.age.txt
│   └── File.1.name.txt
└── sub2
    ├── File.2.age.txt
    └── File.2.name.txt

Result:

$VAR1 = {
          'age' => [
                     './main_dir/sub1/File.1.age.txt',
                     './main_dir/sub2/File.2.age.txt'
                   ],
          'name' => [
                      './main_dir/sub1/File.1.name.txt',
                      './main_dir/sub2/File.2.name.txt'
                    ]
        };

You can create a loop to concatenate and combine files

Upvotes: 3

Related Questions