user2503377
user2503377

Reputation: 71

Write file name in sequence of generation in perl

I have some 1000 files in a directory. Naming convention of the file is like below.

TC_01_abcd_16_07_2014_14_06.txt
TC_02_abcd_16_07_2014_14_06.txt
TC_03_abcd_16_07_2014_14_07.txt
.
.
.
.
TC_100_abcd_16_07_2014_15_16.txt
.
.
.
TC_999_abcd_16_07_2014_17_06.txt

I have written some code like this

my @dir="/var/tmp";
foreach my $inputfile (glob("$dir/*abcd*.txt")) {
print $inputfile."\n";
}

While running this it is not printing in sequence.

it it printing till 09 file then it is printing 1000th file name then

TC_01_abcd_16_07_2014_11_55.txt
TC_02_abcd_16_07_2014_11_55.txt
TC_03_abcd_16_07_2014_11_55.txt
TC_04_abcd_16_07_2014_11_55.txt
TC_05_abcd_16_07_2014_11_56.txt
TC_06_abcd_16_07_2014_11_56.txt
TC_07_abcd_16_07_2014_11_56.txt
TC_08_abcd_16_07_2014_11_56.txt
TC_09_abcd_16_07_2014_11_56.txt
TC_100_abcd_16_07_2014_12_04.txt
TC_101_abcd_16_07_2014_12_04.txt
TC_102_abcd_16_07_2014_12_04.txt
TC_103_abcd_16_07_2014_12_04.txt
TC_104_abcd_16_07_2014_12_04.txt
TC_105_abcd_16_07_2014_12_04.txt
TC_106_abcd_16_07_2014_12_04.txt
TC_107_abcd_16_07_2014_12_04.txt
TC_108_abcd_16_07_2014_12_05.txt
TC_109_abcd_16_07_2014_12_05.txt
TC_10_abcd_16_07_2014_11_56.txt
TC_110_abcd_16_07_2014_12_05.txt
TC_111_abcd_16_07_2014_12_05.txt
TC_112_abcd_16_07_2014_12_05.txt
TC_113_abcd_16_07_2014_12_05.txt
TC_114_abcd_16_07_2014_12_05.txt
TC_115_abcd_16_07_2014_12_05.txt
TC_116_abcd_16_07_2014_12_05.txt
TC_117_abcd_16_07_2014_12_05.txt
TC_118_abcd_16_07_2014_12_05.txt
TC_119_abcd_16_07_2014_12_06.txt
TC_11_abcd_16_07_2014_11_56.txt

Please guide me how to print in sequence

Upvotes: 0

Views: 133

Answers (4)

Brett Schneider
Brett Schneider

Reputation: 4103

the sort in the directory will be alphanumeric, hence your effect. i do not know how to sort glob by creation date, here is a workaround:

my @dir="/var/tmp";
my @files = glob("$dir/*abcd*.txt");
my @sorted_files;
for my $filename (@files) {
 my ($number) = $filename =~ m/TC_(\d+)_abcd/;
 $sorted_files[$number] = $filename;
}
print join "\n", @sorted_filenames;

Upvotes: 0

Mark Setchell
Mark Setchell

Reputation: 207465

Here you go:

#!/usr/bin/perl
use warnings;
use strict;

sub by_substring{
   $a=~ /(\d+)/;
   my $x=$1;
   $b=~ /(\d+)/;
   my $y=$1;
   return $x <=> $y;
}

my @files=<*.txt>;
@files = sort by_substring @files;
for my $inputfile (@files){
   print $inputfile."\n";
}

It will not matter if your filenames start with "TC" or "BD" or "President Carter", this will just use the first set of adjacent digits for the sorting.

Upvotes: 0

David W.
David W.

Reputation: 107040

That's printing the files in order -- ASCII order that is.

In ASCII, the underscore (_) is after the digits when sorting. If you want to sort your files in the correct order, you'll have to sort them yourself. Without sort, there's no guarantee that they'll print in any order. Even worse for you, you don't really want to print the files in either numeric sorted order (because the file names aren't numeric) or ASCII order (because you want TC_10 to print before TC_100.

Therefore, you need to write your own sorting routine. Perl gives you the sort command. By default, it will sort in ASCII order. However, you can define your own subroutine to sort in the order you want. sort will pass two values to your in your sort routine $a and $b. What you can do is parse these two values to get the sort keys you want, then use the <=> or cmp operators to return the values in the correct sort order:

#! /usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw(say);

opendir my $dir, 'temp';      # Opens a directory for reading
my @dir_list = readdir $dir;
closedir $dir;

@dir_list = sort {   # My sort routine embedded inside the sort command
    my $a_val;
    my $b_val;
    if ( $a =~ /^TC_(\d+)_/ ) {
        $a_val = $1;
    }
    else {
        $a_val = 0;
    }
    if ( $b =~ /^TC_(\d+)_/ ) {
        $b_val = $1;
    }
    else {
        $b_val = 0;
    }
    return $a_val <=> $b_val;
} @dir_list;


for my $file (@dir_list) {
    next if $file =~ /^\./;
    say "$file";
}

In my sort subroutine am going to take $a and $b and pull out the number you want to sort them by and put that value into $a_val and $b_val. I also have to watch what happens if the files don't have the name I think they may have. Here I simply decide to set the sort value to 0 and hope for the best.

I am using opendir and readdir instead of globbing. This will end up including . and .. in my list, and it will include any file that starts with .. No problem, I'll remove these when I print out the list.

In my test, this prints out:

TC_01_abcd_16_07_2014_11_55.txt
TC_02_abcd_16_07_2014_11_55.txt
TC_03_abcd_16_07_2014_11_55.txt
TC_04_abcd_16_07_2014_11_55.txt
TC_05_abcd_16_07_2014_11_56.txt
TC_06_abcd_16_07_2014_11_56.txt
TC_07_abcd_16_07_2014_11_56.txt
TC_08_abcd_16_07_2014_11_56.txt
TC_09_abcd_16_07_2014_11_56.txt
TC_10_abcd_16_07_2014_11_56.txt
TC_11_abcd_16_07_2014_11_56.txt
TC_100_abcd_16_07_2014_12_04.txt
TC_101_abcd_16_07_2014_12_04.txt
TC_102_abcd_16_07_2014_12_04.txt
TC_103_abcd_16_07_2014_12_04.txt
TC_104_abcd_16_07_2014_12_04.txt
TC_105_abcd_16_07_2014_12_04.txt
TC_106_abcd_16_07_2014_12_04.txt
TC_107_abcd_16_07_2014_12_04.txt
TC_108_abcd_16_07_2014_12_05.txt
TC_109_abcd_16_07_2014_12_05.txt
TC_110_abcd_16_07_2014_12_05.txt
TC_111_abcd_16_07_2014_12_05.txt
TC_112_abcd_16_07_2014_12_05.txt
TC_113_abcd_16_07_2014_12_05.txt
TC_114_abcd_16_07_2014_12_05.txt
TC_115_abcd_16_07_2014_12_05.txt
TC_116_abcd_16_07_2014_12_05.txt
TC_117_abcd_16_07_2014_12_05.txt
TC_118_abcd_16_07_2014_12_05.txt
TC_119_abcd_16_07_2014_12_06.txt

Files are sorted numerically by the first set of digits after TC_.

Upvotes: 0

harmic
harmic

Reputation: 30587

The files are sorted according to the rules of shell glob expansion, which is a simple alpha sort. You will need to sort them according to a numeric sort of the first numeric field.

Here is one way to do that:

# Declare a sort comparison sub, which extracts the part of the filename
# which we want to sort on and compares them numerically.
# This sub will be called by the sort function with the variables $a and $b
# set to the list items to be compared
sub compareFilenames {
        my ($na) = ($a =~ /TC_(\d+)/);
        my ($nb) = ($b =~ /TC_(\d+)/);
        return $na <=> $nb;
}

# Now use glob to get the list of filenames, but sort them
# using this comparison
foreach my $file (sort compareFilenames glob("$dir/*abcd*.txt")) {
        print "$file\n";
}

See: perldoc for sort

Upvotes: 1

Related Questions