GregH
GregH

Reputation: 12868

What is the regex to extract a date mask in perl?

I have a string in perl that contains a directory specification. If the string contains any individual or combination of substrings that comprise a date mask, I want to extract that substring. For example, the directory spec may be:

/mydir/data/YYYYMMDD

I want to be able to extract the "YYYYMMDD" string. However that portion of the path could be any individual or combination of the following strings:

YY
YYYY
MM
DD

So the directory spec string could read:

   /mydir/data/DD/data2

and I want the "DD" returned as a result of the regex comparison. How do I capture the string when it must contain one or more of those date mask strings and that string must be between two "/" characters or exist at the end of the string?

Upvotes: 3

Views: 1003

Answers (4)

daxim
daxim

Reputation: 39158

I'm making the assumption that YYYY and YY shall not both appear in the same pattern, because otherwise it does not make sense.

use Data::Munge qw(list2re);
use List::MoreUtils qw(uniq);
use Algorithm::Combinatorics qw(variations);
use Perl6::Take qw(gather take);

list2re
uniq
gather {
    for my $n ([qw(YYYY MM DD)], [qw(YY MM DD)]) {
        for my $k (1..scalar @$n) {
            take map { join q(), @$_ } variations($n, $k)
        }
    }
}

The expression returns the regex (?^:DDMMYYYY|DDYYYYMM|MMDDYYYY|MMYYYYDD|YYYYDDMM|YYYYMMDD|DDMMYY|DDYYMM|DDYYYY|MMDDYY|MMYYDD|MMYYYY|YYDDMM|YYMMDD|YYYYDD|YYYYMM|DDMM|DDYY|MMDD|MMYY|YYDD|YYMM|YYYY|DD|MM|YY). (Semi-)Functional programming for the win!

Upvotes: 4

Borodin
Borodin

Reputation: 126722

Assuming the mask fields are always in the order Y - M - D, this will do what you need:

my ($mask) = $path =~ m{ / ( (?:YY){0,2} (?:MM)? (?:DD)? ) (?:/|$) }x;

Upvotes: 1

choroba
choroba

Reputation: 241898

I'd use

my ($date) = m{/([0-9]{2,8})(?:/|$)}

and check whether

not(length($date) % 2)   # $date has even length

and maybe some checks for valid combinations.

Update: OK, to just get the mask, not the numbers, you can change this to

my ($date) = m{/([YMD]{2,8})(?:/|$)};
my $check = $date;
$check =~ s/YYYY/y/;
$check =~ s/MM//;
$check =~ s/DD//;
print "Matches $date\n" if grep $_ eq $check, (q{}, 'y', 'YY');

This should exclude all invalid combinations like YYDDYY or YYYYMMYY and so on.

Upvotes: 0

tadmc
tadmc

Reputation: 3744

I assume that there is only one "date" component, or if not, that you want the 1st one:

#!/usr/bin/perl
use warnings;
use strict;

my @paths = qw(
    /mydir/data/YYYYMMDD
    /mydir/data/YY/data2
    /mydir/data/YYMM/data2
    /mydir/data/DD/data2
);

foreach my $path (@paths) {
    my($date) = grep /^(([YMD])\2)+$/, split '/', $path;
    print "$path: $date\n";
}

Upvotes: 1

Related Questions