Reputation: 11503
For Simple Java Mail I'm trying to deal with a somewhat free-format of delimited email addresses. Note that I'm specifically not validating, just getting the addresses out of a list of addresses. For this use case the addresses can be assumed to be valid.
Here is an example of a valid input:
"[email protected],Sixpack, Joe 1 <[email protected]>, Sixpack, Joe 2 <[email protected]> ;Sixpack, Joe, 3<[email protected]> , [email protected],[email protected];[email protected];"
So there are two basic forms "[email protected]" and "Joe Sixpack ", which can appear in a comma / semicolon delimited string, ignoring white space padding. The problem is that the names can contains delimiters as valid characters.
The following array shows the data needed (trailing spaces or delimiters would not be a big problem):
["[email protected]",
"Sixpack, Joe 1 <[email protected]>",
"Sixpack, Joe 2 <[email protected]>",
"Sixpack, Joe, 3<[email protected]>",
"[email protected]",
"[email protected]",
"[email protected]"]
I can't think of a clean way to deal with this. Any suggestion how I can reliably recognize whether a comma is part of a name or is a delimiter?
Final solution (variation on the accepted answer):
var string = "[email protected],Sixpack, Joe 1 <[email protected]>, Sixpack, Joe 2 <[email protected]> ;Sixpack, Joe, 3<[email protected]> , [email protected],[email protected];[email protected];"
// recognize value tails and replace the delimiters there, disambiguating delimiters
const result = string
.replace(/(@.*?>?)\s*[,;]/g, "$1<|>")
.replace(/<\|>$/,"") // remove trailing delimiter
.split(/\s*<\|>\s*/) // split on delimiter including surround space
console.log(result)
Or in Java:
public static String[] extractEmailAddresses(String emailAddressList) {
return emailAddressList
.replaceAll("(@.*?>?)\\s*[,;]", "$1<|>")
.replaceAll("<\\|>$", "")
.split("\\s*<\\|>\\s*");
}
Upvotes: 1
Views: 1364
Reputation: 887
This pattern works for your provided examples:
([^@,;\s]+@[^@,;\s]+)|(?:$|\s*[,;])(?:\s*)(.*?)<([^@,;\s]+@[^@,;\s]+)>
([^@,;\s]+@[^@,;\s]+) # email defined by an @ with connected chars except ',' ';' and white-space
| # OR
(?:$|\s*[,;])(?:\s*) # start of line OR 0 or more spaces followed by a separator, then 0 or more white-space chars
(.*?) # name
<([^@,;\s]+@[^@,;\s]+)> # email enclosed by lt-gt
Upvotes: 2
Reputation: 8833
Using Java's replaceAll and split functions (mimicked in javascript below), I would say lock onto what you know ends an item (the ".com"), replace separator characters with a unique temp (a uuid or something like <|>
), and then split using your refactored delimiter.
Here is a javascript example, but Java's repalceAll and split can do the same job.
var string = "[email protected],Joe Sixpack <[email protected]>, Sixpack, Joe <[email protected]> ;Sixpack, Joe<[email protected]> , [email protected],[email protected];[email protected];"
const result = string.replace(/(\.com>?)[\s,;]+/g, "$1<|>").replace(/<\|>$/,"").split("<|>")
console.log(result)
Upvotes: 1
Reputation: 10916
since you are not validating, i assume that the email addresses are valid.
Based on this assumption, i will look up an email address followed by ;
or ,
this way i know its valid.
var string = "[email protected],Sixpack, Joe 1 <[email protected]>, Sixpack, Joe 2 <[email protected]> ;Sixpack, Joe, 3<[email protected]> , [email protected],[email protected];[email protected];"
const result = string.match(/(.*?@.*?\..*?)[,;]/g)
console.log(result)
Upvotes: 2