Reputation: 57
I am trying to parse a file and capture all the "sent" dates. The format of the dates vary, so I am looking for patterns and adding the format ( to give to Time::Piece::strptime ) . My date patterns are as follows "Mon Nov 13 12:34:10 2006" or "Tuesday, November 14, 2006 10:58 AM". I used a look ahead assertion to see if the end was AM or PM and handle the two cases I was reading the file line by line and writing the following code:
print "$2<-\n" if $line =~ /(Sent):\s*([^\n]+)(?<=AM|PM)$/;
print "$2<- \n" if $line =~ /(Sent):\s*([^\n]+)(?<!AM|PM)$/;
The problem I ran into was that sometimes I have a whitespace at the end of the lines before the newlines. Like so "Tuesday, November 14, 2006 10:58 AM ", or "Mon Nov 13 12:34:10 2006 ". I can't figure out how to write the look ahead and see if does or doesnot have an AM or PM and then a possible space at the end. It ends up matching both times. I know I could break the loop ( put a proper block when matching and then break out with a "next" once I match the first one ) but I really want to understand what the regex engine is doing. Also, why does $2 contain the AM and PM ? Thanks
Upvotes: 0
Views: 69
Reputation: 6808
Does the following code extract dates correctly?
use strict;
use warnings;
use feature 'say';
my $pattern = qr/Sent:\s+(.*?)\s*$/;
my $date;
while( <DATA> ) {
next if /^$/;
$date = undef;
$date = $1 if /$pattern/;
say "[$date]" if $date;
}
__DATA__
Sent: Mon Nov 13 12:34:10 2006
Sent: Fri Apr 13 12:34:10 2007
Sent: Sat Jun 13 12:34:10 2009
Some extra line to skip
Sent: Tuesday, November 14, 2006 10:58 AM
Sent: Monday, November 16, 2006 6:20 AM
Sent: Thursday, November 17, 2006 8:18 PM
Other extra line to skip
Sent: Wednesday, December 4, 2006 1:06 PM
output
[Mon Nov 13 12:34:10 2006]
[Fri Apr 13 12:34:10 2007]
[Sat Jun 13 12:34:10 2009]
[Tuesday, November 14, 2006 10:58 AM]
[Monday, November 16, 2006 6:20 AM]
[Thursday, November 17, 2006 8:18 PM]
[Wednesday, December 4, 2006 1:06 PM]
Upvotes: 0
Reputation: 386396
if ( my ($sent) = $line =~ /Sent:\s*(.*)/ ) {
print "= $sent\n" if $sent =~ /[AP]M\s*$/;
print "! $sent\n" if $sent !~ /[AP]M\s*$/;
}
Upvotes: 1