Reputation: 12868
I have a string in perl that contains a directory specification. If the string contains any individual or combination of substrings that comprise a date mask, I want to extract that substring. For example, the directory spec may be:
/mydir/data/YYYYMMDD
I want to be able to extract the "YYYYMMDD" string. However that portion of the path could be any individual or combination of the following strings:
YY
YYYY
MM
DD
So the directory spec string could read:
/mydir/data/DD/data2
and I want the "DD" returned as a result of the regex comparison. How do I capture the string when it must contain one or more of those date mask strings and that string must be between two "/" characters or exist at the end of the string?
Upvotes: 3
Views: 1003
Reputation: 39158
I'm making the assumption that YYYY
and YY
shall not both appear in the same pattern, because otherwise it does not make sense.
use Data::Munge qw(list2re);
use List::MoreUtils qw(uniq);
use Algorithm::Combinatorics qw(variations);
use Perl6::Take qw(gather take);
list2re
uniq
gather {
for my $n ([qw(YYYY MM DD)], [qw(YY MM DD)]) {
for my $k (1..scalar @$n) {
take map { join q(), @$_ } variations($n, $k)
}
}
}
The expression returns the regex (?^:DDMMYYYY|DDYYYYMM|MMDDYYYY|MMYYYYDD|YYYYDDMM|YYYYMMDD|DDMMYY|DDYYMM|DDYYYY|MMDDYY|MMYYDD|MMYYYY|YYDDMM|YYMMDD|YYYYDD|YYYYMM|DDMM|DDYY|MMDD|MMYY|YYDD|YYMM|YYYY|DD|MM|YY)
. (Semi-)Functional programming for the win!
Upvotes: 4
Reputation: 126722
Assuming the mask fields are always in the order Y - M - D, this will do what you need:
my ($mask) = $path =~ m{ / ( (?:YY){0,2} (?:MM)? (?:DD)? ) (?:/|$) }x;
Upvotes: 1
Reputation: 241898
I'd use
my ($date) = m{/([0-9]{2,8})(?:/|$)}
and check whether
not(length($date) % 2) # $date has even length
and maybe some checks for valid combinations.
Update: OK, to just get the mask, not the numbers, you can change this to
my ($date) = m{/([YMD]{2,8})(?:/|$)};
my $check = $date;
$check =~ s/YYYY/y/;
$check =~ s/MM//;
$check =~ s/DD//;
print "Matches $date\n" if grep $_ eq $check, (q{}, 'y', 'YY');
This should exclude all invalid combinations like YYDDYY or YYYYMMYY and so on.
Upvotes: 0
Reputation: 3744
I assume that there is only one "date" component, or if not, that you want the 1st one:
#!/usr/bin/perl
use warnings;
use strict;
my @paths = qw(
/mydir/data/YYYYMMDD
/mydir/data/YY/data2
/mydir/data/YYMM/data2
/mydir/data/DD/data2
);
foreach my $path (@paths) {
my($date) = grep /^(([YMD])\2)+$/, split '/', $path;
print "$path: $date\n";
}
Upvotes: 1