Reputation: 2298
As a PHP programmer new to Perl working through 'Programming Perl', I have come across the following regex:
/^(.*?): (.*)$/;
This regex is intended to parse an email header and insert it into a hash. The email header is contained in a seperate .txt file and is in the following format:
From: [email protected]
To: [email protected]
Date: Mon, 1st Jan 2000 09:00:00 -1000
Subject: Subject here
The entire code I am using to work with this example regex is as follows:
use warnings;
use strict;
my %fields = ();
open(FILE, 'header.txt') or die('Could not open.');
while(<FILE>)
{
/^(.*?): (.*)$/;
$fields{$1} = $2;
}
foreach(%fields)
{
print;
print "\n";
}
Now, onto my question. I am unsure as to why the first subpattern has been modified to use a minimal quantifier. It is perhaps a small point to get hung up with, but I cannot see why it has been done.
Thanks for any replies.
Upvotes: 2
Views: 215
Reputation: 67900
The reason it uses a minimal quantifier is that it does not need to read any further than the colon. And in fact, it should not. I'm not sure what characters can exist in these keywords, but I am pretty sure .
is a bit too wide, and that is the problem. If your fields contain any colons, a non-minimal regex would gobble it all up, for example:
Subject: Counter Strike: Source
If the first subpattern was greedy, it would grab Subject: Counter Strike
, and not just Subject
.
Upvotes: 4
Reputation: 4349
Without that minimal quantifier, the value for $1 obtained from the "Date:" line would actually be "Date: Mon, 1st Jan 2000 09:00" due to Perl regex being greedy by default.
Upvotes: 0
Reputation: 5425
Without a minimal quantifier, wouldn't the first capture for the Date line be "Date: Mon, 1st Jan 2000 09:00:" instead of "Date:"?
Upvotes: 0
Reputation: 20663
Because otherwise it will match all characters till last ':'. For example, without minimal quantifier this string:
Test: My: Weird: String
will match "Test: My: Weird" as the first group. But with minimal quantifier it will match only "Test".
Upvotes: 4
Reputation: 206831
If it hadn't, there is a risk that it wouldn't match correctly if the value contains :<space>
.
Imagine:
Subject: Urgent: Need a regex
Without the minimal match $1
would get Subject: Urgent
, and $2
would be Need a regex
.
Upvotes: 7
Reputation: 8996
Consider what happens if the subject is Subject: RE: reply to something
.
A minimal quantifier will stop after Subject
, but the greedy quantifier will match up to RE
.
Upvotes: 6