Reputation: 1583
I want to scan the body of an email for email address lists from forwarded emails, like:
From: John Smith <[email protected]>
To: Jane Smith <[email protected]>, Mary Smith
<[email protected]>
Cc: Ed Smith <[email protected]>
Subject: this is a test
I'm going to use Mail_RFC822::parseAddressList()
to fully parse each list (there are a lot of details to get right in there, so I shouldn't try to re-engineer it), but I do want to pluck out the lines to hand off to this function. I have a simple regex that just looks for lines with email addresses, and that works most of the time.
But in the wild, there are sometimes emails like the example above, where the name and address get split onto different lines. If I do it line by line, the top half of the To: line above will fail to parse at all in parseAddressList() because a name without an address is invalid; and the bottom half will parse, but will be missing the name, which was on the previous line.
So I need a regex that can look at multiple lines at once, which complicates things beyond my expertise. An adequate solution would continue to group lines together as long as it keeps finding a basic email pattern ([\w\.\+\-]+@[\w\.\-]+\.[\w\.\-]+
... doesn't need to be perfect) but without a word-colon combo at the beginning of the line (^\S*:
) so that, as in the example above, the Cc: line is a separate match. Thanks in advance for your help.
Upvotes: 3
Views: 275
Reputation: 3871
How about using the regex s
operator, so that .
matches newline characters too: /your regex/s
?
Upvotes: 0
Reputation: 39138
You can pre-process the string to remove new lines before <
characters and then pass the result to your parseAddressList
function.
Something like replacing /(?:\r?\n|\r)\s*</
with <
:
$emails = Mail_RFC822::parseAddressList(preg_replace('/(?:\r?\n|\r)\s*</', '<', $emailHeaders));
Upvotes: 1