PFranchise
PFranchise

Reputation: 6752

Simple Perl Regex parser

Hey, I am working on a very basic parser. I am almost certain that my regex is correct, but values do not seem to be being stored in my $1 and $2. Am I doing something wrong? I am just looking for tips to alter my code. Thanks for any advice! Also, I am new to Perl, so if I did something wrong, I am looking to get off on the right foot and develop solid habits.

Sample line from file:

Sat 02-August-2008 20:47 - 123.112.3.209 - "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;

I am just getting the hours from the times.

foreach my $line (@lines)
{   
my $match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

if( $1 == 0)
{
    $times[$2] = $times[$2] + 1;
}
else
{   
    my $time = $1.$2;
    $times[$time] = $times[$time]+ 1;
}
 }


print "\n";
for(my $i=0;$i<24;$i++)
{
print "$i: $times[$i]\n";
}

Upvotes: 3

Views: 396

Answers (3)

dawg
dawg

Reputation: 103834

First, if you are new to Perl, one of the strengths is CPAN and the many solutions there. Don't reinvent the wheel!

There is a great module called Date::Parse that will parse the time part for you. Then the only regex problem that you have is separating out the time part of your line.

Based on your one line sample, this code will do that:

use strict;
use warnings;

use Date::Parse;

my $line="Sat 02-August-2008 20:47 - 123.112.3.209 - \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;";
my $tmpart;

if ($line=~ /^(.*\d+:\d+) -/) {
    $tmpart=$1;

    print "Time part = $tmpart\n";

    my $time=str2time($tmpart);
    my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($tmpart);

    $year+=1900;
    $month+=1;

    print "Unix time: $time\n";
    print "Parsed time: $month/$day/$year $hh:$mm:$ss  \n\n";
} 
else {
   warn "no match!\n";
}   

This will return a Unix time number that is then easy to work with. Or (as shown) you can parse the individual components of the time.

Upvotes: 1

Dasvid
Dasvid

Reputation: 43

Can you give some example of what kind of pattern you are try to match? Otherwise I won't be able to tell if your regex matches your pattern or not. However there are some improvements you can make about your code:

First off, always test if a match is successful if you want to use $1, $2 etc

if($match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/) {

    if( $1 == 0)
    {
        $times[$2] = $times[$2] + 1;
    }
    else
    {   
        my $time = $1.$2;
        $times[$time] = $times[$time]+ 1;
    }
} else {
    warn "no match!\n";
}

Second, always use the '-w' switch. In this case, you will probably get the warning message about $1 and $2 are not initialized due to failed match:

#!/usr/bin/perl -w

Upvotes: 3

Brian Rasmussen
Brian Rasmussen

Reputation: 116401

If you want to match on $line shouldn't the code read

$line =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

See here.

Upvotes: 7

Related Questions