pb149
pb149

Reputation: 2298

Perl REGEX Question

As a PHP programmer new to Perl working through 'Programming Perl', I have come across the following regex:

/^(.*?): (.*)$/;

This regex is intended to parse an email header and insert it into a hash. The email header is contained in a seperate .txt file and is in the following format:

From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here

The entire code I am using to work with this example regex is as follows:

use warnings;
use strict;

my %fields = ();

open(FILE, 'header.txt') or die('Could not open.');

while(<FILE>)
{
    /^(.*?): (.*)$/;
    $fields{$1} = $2;
}

foreach(%fields)
{
    print;
    print "\n";
}

Now, onto my question. I am unsure as to why the first subpattern has been modified to use a minimal quantifier. It is perhaps a small point to get hung up with, but I cannot see why it has been done.

Thanks for any replies.

Upvotes: 2

Views: 215

Answers (6)

TLP
TLP

Reputation: 67900

The reason it uses a minimal quantifier is that it does not need to read any further than the colon. And in fact, it should not. I'm not sure what characters can exist in these keywords, but I am pretty sure . is a bit too wide, and that is the problem. If your fields contain any colons, a non-minimal regex would gobble it all up, for example:

Subject: Counter Strike: Source

If the first subpattern was greedy, it would grab Subject: Counter Strike, and not just Subject.

Upvotes: 4

Brian Showalter
Brian Showalter

Reputation: 4349

Without that minimal quantifier, the value for $1 obtained from the "Date:" line would actually be "Date: Mon, 1st Jan 2000 09:00" due to Perl regex being greedy by default.

Upvotes: 0

Shea Levy
Shea Levy

Reputation: 5425

Without a minimal quantifier, wouldn't the first capture for the Date line be "Date: Mon, 1st Jan 2000 09:00:" instead of "Date:"?

Upvotes: 0

Andrey Adamovich
Andrey Adamovich

Reputation: 20663

Because otherwise it will match all characters till last ':'. For example, without minimal quantifier this string:

Test: My: Weird: String

will match "Test: My: Weird" as the first group. But with minimal quantifier it will match only "Test".

Upvotes: 4

Mat
Mat

Reputation: 206831

If it hadn't, there is a risk that it wouldn't match correctly if the value contains :<space>.

Imagine:

Subject: Urgent: Need a regex

Without the minimal match $1 would get Subject: Urgent, and $2 would be Need a regex.

Upvotes: 7

dsolimano
dsolimano

Reputation: 8996

Consider what happens if the subject is Subject: RE: reply to something.

A minimal quantifier will stop after Subject, but the greedy quantifier will match up to RE.

Upvotes: 6

Related Questions