Reputation: 4077
I´m trying to get the email from and cc from a forwarded email, when the body looks like this:
$body = '-------
Begin forwarded message:
From: Sarah Johnson <[email protected]>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <[email protected]>
Cc: Ralph Johnson <[email protected]>
Hi,
hello, thank you and goodbye!
[email protected]'
Now, when I do the following:
$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
I correctly get:
from: sarah johnson <[email protected]>
Now, why does the cc don't work? I do something very similar, only changing from to cc:
$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
and I get:
cc: ralph johnson <[email protected]> hi, hello, thank you and goodbye! [email protected]
If I remove the email from the original body footer (removing [email protected]) then I correctly get:
cc: ralph johnson <[email protected]>
It looks like that email is affecting the regular expression. But how, and why doesn't it affect it in the from? How can I fix this?
Upvotes: 0
Views: 143
Reputation: 47874
Parsing the email's metadata doesn't need to use a convoluted regex pattern.
Use ^
with the m
pattern modifier to start matches from the beginning of any new line.
Match the start of a line before a colon with [^:\v]+
. The \v
in the negated character class prevents matching multiple lines.
For easiest Accessibility, form an associative array from the two captured values of each qualifying line.
Code: (Demo)
preg_match_all('/^([^:\v]+): *(.+)/m', $body, $m);
var_export(
array_combine($m[1], $m[2])
);
Output:
array (
'From' => 'Sarah Johnson <[email protected]>',
'Subject' => 'email subject',
'Date' => 'February 22, 2013 3:48:12 AM',
'To' => 'Email Recipient <[email protected]>',
'Cc' => 'Ralph Johnson <[email protected]>',
)
Upvotes: 0
Reputation: 1805
Try like this
$body = '-------
Begin forwarded message:
From: Sarah Johnson <[email protected]>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <[email protected]>
Cc: Ralph Johnson <[email protected]>
Hi,
hello, thank you and goodbye!
[email protected]';
$pattern = '#(?:from|Cc):\s+[^<>]+<([^@]+@[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';
Output
Array
(
[0] => Array
(
[0] => From: Sarah Johnson <[email protected]>
[1] => Cc: Ralph Johnson <[email protected]>
)
[1] => Array
(
[0] => [email protected]
[1] => [email protected]
)
)
$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email
Upvotes: 1
Reputation: 92976
The problem is, that \D*
matches too much, i.e. it is also matching newline characters. I would be more restrictive here. Why do you use \D
(not a Digit) at all?
With e.g. [^@]*
it is working
cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S
See it here on Regexr.
This way, you are sure that this first part is not matching beyond the email address.
This \D
is also the reason, it is working for the first, the "From" case. There are digits in the "Date" row, therefore it does not match over this row.
Upvotes: 3