Reputation: 20862

Perl multiple pattern matching and substitution in one pass

09/27/2009 19:48:00 Departure Location

I am trying to match and substitute the given line in a text file. The length of the text after date and time can vary. I am reading the file line by line and I need the final output to be printed as--

Date=> 09/27/2009
Time=> 19:48:00 
Text=> Departure Location

I have tried to do the substitutions in one pass as follows-

if($line =~ m/(\d+)\/(\d+)\/(\d+)\h{1}(\d+):(\d+):(\d+)/){

    $line =~ s/(\[a-zA-Z])/\nText=> $1/;
    $line =~ s/(\d+)\/(\d+)\/(\d+)/\nDate=> $1\/$2\/$3/;
    $line =~ s/\h{1}(\d+):(\d+):(\d+)/\nTime=> $1\:$2\:$3/;

    print FH "$line\n";

}

But all I am getting is this-

Date=> 09/27/2009
Time=> 19:48:10 Departure Location

I know there is a problem in matching the Text but I am not able to fix it. I am still a Perl beginner. Any help is appreciated. Thanks!

Upvotes: 2

Answers (4)

Borodin

Reputation: 126722

Cramming as much functionality into a small space only contributes to the reputation Perl has for being incomprehensible.

This code seems much clearer to me

$line = <<END if $line =~ m|^(\d\d/\d\d/\d{4}) \s+ (\d\d:\d\d:\d\d) \s+ (.*)|x;
Date=> $1
Time=> $2 
Text=> $3
END

Upvotes: 2

ikegami

Reputation: 385789

You're doing too much work in your parser.

my ($date, $time, $text) = split(' ', $_, 3);
say "Date=> $date";
say "Time=> $time";
say "Text=> $text";

Upvotes: 2

DavidO

Reputation: 13942

This pattern in particular is giving you trouble:

$line =~ s/(\[a-zA-Z])/\nText=> $1/;

There are a few problems with it. First, the backslash in front of the left bracket: \[, is escaping the bracket so that your character class isn't a character class at all, but rather the literal text, "[a-zA-Z]". Second, there is no "whitespace" permitted in your text match, so if the text portion of the string contains any space characters, (or punctuation) it will also fail to match. Third, there is no quantifier, so it will only match a single character. A final note is that it should probably be anchored to the end of the string. It might work like this (but don't use it, read on instead):

$line =~ s/([a-zA-Z\s]+)$/\nText=> $1/;

But there's probably a better solution. It can all be done in one pass without losing clarity. To me it starts to make more sense if you capture larger segments:

$string =~ s{^
    (\d\d/\d\d/\d{4})\s    # The date.
    (\d\d:\d\d:\d\d)\s     # The time.
    (.+)$                  # The rest (the text).
}{Date=> $1\nTime=> $2\nText=> $3}x;

As is usually the case, the /x modifier facilitates easier to read code.

There are some good resources available for getting a handle on Perl's regular expressions. I would suggest starting with perldoc perlretut, which is "a basic tutorial on understanding, creating and using regular expressions in Perl."

Using named captures can also add a degree of clarity, especially as your regexes become more complex:

$string =~ s{
    ^
    (?<date>\d\d/\d\d/\d{4})\s
    (?<time>\d\d:\d\d:\d\d)\s
    (?<text>.+)
    $
}
{Date=> $+{date}\nTime=> $+{time}\nText=> $+{text}}x;

Upvotes: 4

Sinan Ünür

Reputation: 118128

split with a limit would work nicely here. The pairwise is not strictly necessary, but helped me avoid a loop:

#!/usr/bin/env perl

use strict; use warnings;
use feature 'say';
use List::MoreUtils qw( pairwise );

my $input = q{09/27/2009 19:48:00 Departure Location};
my @fields = qw(Date Time Text);
my @values = split ' ', $input, @fields;

{
    no warnings 'once';
    say join("\n", pairwise { "$a=> $b" } @fields, @values);
}

Output:

Date=> 09/27/2009
Time=> 19:48:00
Text=> Departure Location

Upvotes: 5

Perl multiple pattern matching and substitution in one pass

Answers (4)

Related Questions