programminglearner
programminglearner

Reputation: 542

How to use timestamp to get recent files in Perl

I have a folder filled with logs of files. Logs are generated weekly. For example.

/path/week20a.log
/path/week20b.log
/path/week29d.log
/path/week30c.log
/path/week31a.log
/path/week32a.log

I want to get the most recent log for the last week (last week meaning the latest of the last week's log file before this weeks) and the week before it. In this case,

/path/week21d.log
/path/week20c.log

I have two subroutine functions for this that look like:

sub getweek {
$week = ???; #where week should return one of the 'week***' listed above
my @files = File::Find::Rule->file()
                            ->name('*$week.log')
                            ->in(mydir);

my @files_with_mtimes = map +{ name => $_, mtime => (stat $_)[9] }, @files;
our @sorted_files = reverse sort { $a->{mtime} <=> $b->{mtime} } @files_with_mtimes;

return $sorted_files[0]{name};
}



The problem is I'm getting current week using Time::Piece->new->strftime("%V") and then I am doing that -2 to get the week before last. I'm hardcoding in the assumption that the most recent log from 2 weeks ago will always be there and will be the one before last. What if there is no run last week? In which case, last log will be the week before that and the one before that will be the one before that week.

How can I have two subroutines, where one gets last week's log, Time::Piece->new->strftime("%V")-1 IF IT EXISTS AND IF NOT, DECREMENTS TO FIND THE WEEK BEFORE THAT and then based of that does something similar to find the last log before that one.

Upvotes: 0

Views: 298

Answers (2)

zdim
zdim

Reputation: 66883

My take on the problem: Find latest weekly files for the last two weeks for which there are files.

One way: Sort all files by timestamps, group into weeks and take latest from each, for the last two.

use warnings;
use strict;
use feature 'say', 'state';
use List::MoreUtils qw(part);
use Time::Piece;

my $dir = shift;
die "Usage: $0 directory\n" if not $dir or not -d $dir;

my @files =                       # arrayrefs: name, secs since epoch    
    sort { $b->[1] <=> $a->[1] }
    map { [$_, (stat $_)[9]] } 
    grep { -f } 
        glob "\Q$dir\E/*.log";

my $dt = Time::Piece->localtime;
my $curr_week = $dt->week;
my $curr_yr   = $dt->year;

my @parts = part {
    state $this_week = $curr_week;
    my $t = $dt->strptime($_->[1], "%s");
    if ($t->year != $curr_yr) {
        $this_week += 51 ;
        $curr_yr = $t->year;
    }
    $this_week - $t->week;  # partition index: week offset
} @files;

# Remove the first element if it is for the current week
shift @parts  if $parts[0] and 
    Time::Piece->strptime($parts[0]->[0][1], "%s")->week == $curr_week;

my @last_in_weeks = map { $_->[0] // () } grep { defined } @parts;    
say $_->[0] for @last_in_weeks[0,1];

This can be optimized, in the first place by cutting off week-based partitioning of all files as soon as we have the needed number of weeks (two in this case).

Comments

  • Filelist is built assuming that all files are directly in the given directory. The \Q...\E in glob is there to deny a (rare but possible) injection bug. Since it also quotes possible spaces in directory names we don't need to double quote the glob

  • Files are sorted in reverse by modification time and since we'll need timestamps later they are kept, so @files carries two-element arrayrefs. A more convenient and less efficient option is to pack name and timestamp in a hash

  • List::MoreUtils::part assigns elements into groups, which are arrayrefs in the returned list. They are indexed by what the block returns; so files in the week with offset 2 (returned from the block) go into the arrayref which is the third element of the returned list. Thus there are undef elements when there are no files for some weeks

  • For partitioning, the week of each file is subtracted from the $curr_week so that the partition index starts at the most recent week. (Then the first element of @parts is removed if it has files from the current week, since the current week's logs are not wanted.) However, ...

  • ... Time::Piece::week returns the week number in this year. So at end of January $t->week may be 3, nicely subtracted from $this_week (32 at time of this writing) for index 29, but as we keep processing the next file, from end of December, is week 51! A negative offset is a no-go for part. So $this_week need be += 51 every time year changes

This code, and the version below, uses timestamps to find out the week. If the week should be pulled from the filename instead, then replace $dt->week with a simple regex prying the week number out of a filename, and take the first file for that week (if sorted in reverse). Also drop the $dt altogether and year-based considerations aren't needed either; it's much simpler that way.


 Best done manually, since part can be cleanly interrupted only by throwing an exception (then eval-ed) and exceptions in general shouldn't be used for flow control.

my $dt = Time::Piece->localtime;

my ($week, $prev_week) = ($dt->week) x 2;
my $prev_year = $dt->year;
my @latest_weekly;

foreach my $rf (@files) {
    $dt = $dt->strptime($rf->[1], "%s");

    if ($dt->year != $prev_year) {
        $prev_week += 51;
        $prev_year = $dt->year;
    }

    # New week? This first file in the new week is the latest one
    if ($dt->week < $prev_week) {   
        push @latest_weekly, $rf; 
        last if @latest_weekly >= 2;  # really take only two
        #$prev_week = $dt->week;      # if we are to continue
    }
}

say $_->[0] for @latest_weekly;

This is no harder and is far more efficient (it's more efficient without cutting it off at the second week, as well). However, the code using part I think is more general and maintainable -- more easily changed to meet other ends.

Upvotes: 4

ikegami
ikegami

Reputation: 385655

use File::Basename qw( basename );

my @qfns = ...;
my $target_week = ...;

my %qfns_by_week;
for my $qfn (@qfns) {
   my $fn = basename($qfn);
   my ($week) = $fn =~ /(\d+)/
      or warn("Skipping $qfn: Unrecognized format\n"), next;

   next if $week > $target_week;

   push @{ $qfns_by_week{$week} }, $qfn;
}

my ($week2, $week1) = sort { $b <=> $a } keys(%qfns_by_week);

my @latest_qfns;
push @latest_qfns, ( reverse sort @{ $qfns_by_week{$week1} } )[0] if defined($week1);
push @latest_qfns, ( reverse sort @{ $qfns_by_week{$week2} } )[0] if defined($week2);

Note that the final two lines assume that the path and the leading part of the file name are the same for all logs of the same week.

Upvotes: 3

Related Questions