tjwrona
tjwrona

Reputation: 9035

Recognizing date strings in Perl

I have a script that is processing a lot of data. Some of the data fields coming in are dates or timestamps.

When I run into a date/timestamp I need to convert it from the local time to GMT. Obviously I don't want to attempt this conversion if the field is not a date or timestamp. The problem is, I don't know what format the date or timestamp field will have.


Scalar::Util has a looks_like_number function to determine whether a variable "Looks like a number". Is there any equivalent function for recognizing dates or timestamps?

Upvotes: 2

Views: 275

Answers (1)

Sobrique
Sobrique

Reputation: 53478

Thinking about the general case - there's just an awful lot of different possible ways you can write a date. That's why most systems just don't, and use a numeric timecode internally, and format date on request.

Not least amongst these is the implicit ambiguity of the digits - by convention US date formats are month/day, but a lot of the rest of the world use day/month.

However, the approach I'd probably go with is - given you do have fields you're trying to process - use something like strptime to parse the date to a time stamp, validate the timestamp (e.g. is it 'sensible' given the data) and if it is, assume that was corrrect.

E.g.:

#!/usr/bin/env perl

use strict;
use warnings;
use Time::Piece;

my @formats = ( '%Y/%m/%d %H:%M:%S', '%d %b %y', );

my @example_strs = ( '14 Oct 15', '2014/08/22 17:42:33', 'bogus' );


foreach my $example_str (@example_strs) {
    my $timestamp; 
    foreach my $format (@formats) {
        if ( not defined $timestamp 
             and $timestamp =
            eval { localtime->strptime( $example_str, $format ) } )
        {
            print "$example_str converted to $timestamp using $format\n";
        }
    }
    print "Couldn't parse $example_str" unless $timestamp;
}

You could also add some range checking on $timestamp to ensure the date is sane.

e.g.

if ( $timestamp < time() - 365 * 60 * 60 
  or $timestamp > time() + 84700 )  {
   #assume it's invalid. 
}

This'll work for integer validation too - but if your integer values are close enough to time() it's impossible to tell the difference. (And statistically speaking it's likely to be keyed to time if it is in that range)

But have a bit of a flick through strftime to see how many formatting options there are.

Upvotes: 4

Related Questions