frozenthorn
frozenthorn

Reputation: 87

Need help sorting file list based on datestamp in filename

Unsorted data

5CM00225_10_16_2017_10_54_42.xml
5CM10538_10_16_2017_11_04_18.xml
1ZM06004_10_16_2017_11_04_14.xml
5XM10010_10_17_2017_08_00_47.xml
5ZM05391_10_15_2017_08_51_07.xml
5ZM05388_10_17_2017_08_01_06.xml
5ZM00058_10_17_2017_08_00_49.xml
NMC00166_10_15_2017_08_51_06.xml
5CM10538_10_15_2017_08_51_06.xml

Expected results

NMC00166_10_15_2017_08_51_06.xml
5CM10538_10_15_2017_08_51_06.xml
5ZM05391_10_15_2017_08_51_07.xml
5CM00225_10_16_2017_10_54_42.xml
1ZM06004_10_16_2017_11_04_14.xml
5CM10538_10_16_2017_11_04_18.xml
5XM10010_10_17_2017_08_00_47.xml
5ZM00058_10_17_2017_08_00_49.xml
5ZM05388_10_17_2017_08_01_06.xml

I use Net::SFTP to get a directory listing off a remote site and compare to a local file listing. I'd like to sort the list by date in the filename, but I'm running into issues due to there being other information in the string that I need to ignore.

my $sftp = Net::SFTP->new( $host,  %args);

my @list = $sftp->ls($path);

open(my $fh, '>', $file); # open a log file to save remote directory listing

    my @sorted = map  { $_->[0] }
         sort { $a->[1] <=> $b->[1] }
         map  { [$_, $_=~/(\d{2})_(\d{2})_(\d{4})_(\d{2})_(\d{2})_(\d{2})/] } # unsuccessful sorting attempt
         @list;

    foreach my $item (@sorted) {
        $i = ${item}->{filename};                               
        print $fh "$1\n"; # prints each record to the open log file
    }
close $fh;

I have done sorting before and plenty of regex but never at the same time, and I'm clearly bungling it up, because it isn't sorting anything, and not throwing any errors.

I thought about extracting the DD_MM_YYYY_hh_mm_ss out of each string and trying to use it as a reference, but I didn't make any usable headway so I scrapped the idea.

Upvotes: 2

Views: 97

Answers (4)

Polar Bear
Polar Bear

Reputation: 6798

Timestamp combined with first 9 characters can be used as hash key.

Then it is just a matter to sort hash on key and output data.

use strict;
use warnings;
use feature 'say';

my %hash;

while(<DATA>) {
    chomp;
    next unless /(.+?)_(.+?)\.xml/;
    $hash{"$2_$1"} = $_;
}

say $hash{$_} for sort keys %hash;

__DATA__
5CM00225_10_16_2017_10_54_42.xml
5CM10538_10_16_2017_11_04_18.xml
1ZM06004_10_16_2017_11_04_14.xml
5XM10010_10_17_2017_08_00_47.xml
5ZM05391_10_15_2017_08_51_07.xml
5ZM05388_10_17_2017_08_01_06.xml
5ZM00058_10_17_2017_08_00_49.xml
NMC00166_10_15_2017_08_51_06.xml
5CM10538_10_15_2017_08_51_06.xml

Output

5CM10538_10_15_2017_08_51_06.xml
NMC00166_10_15_2017_08_51_06.xml
5ZM05391_10_15_2017_08_51_07.xml
5CM00225_10_16_2017_10_54_42.xml
1ZM06004_10_16_2017_11_04_14.xml
5CM10538_10_16_2017_11_04_18.xml
5XM10010_10_17_2017_08_00_47.xml
5ZM00058_10_17_2017_08_00_49.xml
5ZM05388_10_17_2017_08_01_06.xml

Upvotes: 0

zdim
zdim

Reputation: 66881

To parse and compare dates it also makes sense using a date-time module, Time::Piece here.

A naive version (see below for a more efficient one)

use warnings;
use strict;
use feature 'say';

use Time::Piece;

my @orig = ( 
    '5CM00225_10_16_2017_10_54_42.xml',
    '5CM10538_10_16_2017_11_04_18.xml',
    '1ZM06004_10_16_2017_11_04_14.xml',
    '5XM10010_10_17_2017_08_00_47.xml',
    '5ZM05391_10_15_2017_08_51_07.xml',
    '5ZM05388_10_17_2017_08_01_06.xml',
    '5ZM00058_10_17_2017_08_00_49.xml',
    'NMC00166_10_15_2017_08_51_06.xml',
    '5CM10538_10_15_2017_08_51_06.xml',
);

my $dt = Time::Piece->new;

my @sorted = sort {
    my $a_dt = $dt->strptime($a =~ /_(.*)\./, '%m_%d_%Y_%H_%M_%S');
    my $b_dt = $dt->strptime($b =~ /_(.*)\./, '%m_%d_%Y_%H_%M_%S');
    $a_dt <=> $b_dt
} @orig;

say for @sorted;

This runs a regex and strptime for every comparison.

Instead, precompute them all

my @sorted =
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ $dt->strptime(/_(.*)\./, '%m_%d_%Y_%H_%M_%S'),  $_ ] }
    @orig;

This extracts the date-time portion of the string and builds a date-time object from it with strptime, placing it in an arrayref together with the original string. It does this for the whole input using map.

Then that list is passed to sort which sorts it by its first element, where the Time::Piece object's builtin comparison is used. Then the second map pulls the original strings out, for our result.

Upvotes: 1

Andrey
Andrey

Reputation: 1818

Probably not the prettiest solution but it works:

use strict;
use warnings;
use Data::Dumper;

my @list = (
    '5CM00225_10_16_2017_10_54_42.xml',
    '5CM10538_10_16_2017_11_04_18.xml',
    '1ZM06004_10_16_2017_11_04_14.xml',
    '5XM10010_10_17_2017_08_00_47.xml',
    '5ZM05391_10_15_2017_08_51_07.xml',
    '5ZM05388_10_17_2017_08_01_06.xml',
    '5ZM00058_10_17_2017_08_00_49.xml',
    'NMC00166_10_15_2017_08_51_06.xml',
    '5CM10538_10_15_2017_08_51_06.xml'
);

my @sorted = sort {
    my ($mm1,$dd1,$yy1,$hh1,$min1,$ss1) = ($a =~ /_(\d{2})_(\d{2})_(\d{4})_(\d{2})_(\d{2})_(\d{2})\.xml$/);
    my ($mm2,$dd2,$yy2,$hh2,$min2,$ss2) = ($b =~ /_(\d{2})_(\d{2})_(\d{4})_(\d{2})_(\d{2})_(\d{2})\.xml$/);
    my $x = $yy1.$mm1.$dd1.$hh1.$min1.$ss1;
    my $y = $yy2.$mm2.$dd2.$hh2.$min2.$ss2;
    $x <=> $y;
} @list;

print Dumper(\@sorted);

Upvotes: 1

toolic
toolic

Reputation: 62064

This produces your desired output. It splits each line on underscore or period into a list, then only keeps the "columns" you want, in the order you want them. It keeps the year, followed by the month, day, etc. Then it joins the list elements into a new date string, then sorts lines based on dates.

use warnings;
use strict;

my @list;
while (<DATA>) {
    chomp;
    push @list, $_;
}

my @sorted = map  { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map  { [$_, join '', (split /[_.]/)[3,1,2,4,5,6] ] }
@list;

__DATA__
5CM00225_10_16_2017_10_54_42.xml
5CM10538_10_16_2017_11_04_18.xml
1ZM06004_10_16_2017_11_04_14.xml
5XM10010_10_17_2017_08_00_47.xml
5ZM05391_10_15_2017_08_51_07.xml
5ZM05388_10_17_2017_08_01_06.xml
5ZM00058_10_17_2017_08_00_49.xml
NMC00166_10_15_2017_08_51_06.xml
5CM10538_10_15_2017_08_51_06.xml

I believe your code fails because it returns the list in the order they appear on the line, namely month, day, etc.

Upvotes: 4

Related Questions