regex for email address list that may span multiple lines

Question

I want to scan the body of an email for email address lists from forwarded emails, like:

From: John Smith 
To: Jane Smith , Mary Smith

Cc: Ed Smith 
Subject: this is a test

I'm going to use Mail_RFC822::parseAddressList() to fully parse each list (there are a lot of details to get right in there, so I shouldn't try to re-engineer it), but I do want to pluck out the lines to hand off to this function. I have a simple regex that just looks for lines with email addresses, and that works most of the time.

But in the wild, there are sometimes emails like the example above, where the name and address get split onto different lines. If I do it line by line, the top half of the To: line above will fail to parse at all in parseAddressList() because a name without an address is invalid; and the bottom half will parse, but will be missing the name, which was on the previous line.

So I need a regex that can look at multiple lines at once, which complicates things beyond my expertise. An adequate solution would continue to group lines together as long as it keeps finding a basic email pattern ([\w\.\+\-]+@[\w\.\-]+\.[\w\.\-]+ ... doesn't need to be perfect) but without a word-colon combo at the beginning of the line (^\S*:) so that, as in the example above, the Cc: line is a separate match. Thanks in advance for your help.

instanceof me · Accepted Answer

You can pre-process the string to remove new lines before < characters and then pass the result to your parseAddressList function.

Something like replacing /(?: ? | )\s* with <:



$emails = Mail_RFC822::parseAddressList(preg_replace('/(?:
?
|
)\s*

regex for email address list that may span multiple lines

Answers (2)

Related Questions