Reputation: 53
I almost have this working, but not quite.
I have a JavaScript string that contains a list of emails each formatted differently (no newlines, edited for legibility's sake):
var emailList = '[email protected],
lucky <[email protected]>,
"William Tell" <[email protected]>,
"John Rambo, III" <[email protected]>,
"there, might, be, several, commas inside the quotes" <[email protected]>,
"yes, this is also a valid email address, can you believe" <yes@this@[email protected]>'
Firstly, I need to split this string into the different emails. Emails are separated by ', '
:
[email protected], lucky <[email protected]>
but ', '
also might occur in names enclosed by quotes:
"John Rambo, III" <[email protected]>
Even multiple commas can be found inside quotes:
"there, might, be, several, commas inside the quotes" <[email protected]>
Step 1: replace
,
enclosed in quotes
I'd like to substitute the commas for something like <<<<!!!!>>>>
I've tried this regexp: (".*)(,)(\s.*"), $1<<<<!!!!>>>>$3
https://regex101.com/r/baha69/1/ but it's NOT replacing commas within quotes... :-(
Step 2: split array and undo comma substitution
This can be easily done now in JavaScript with split and replace:
var Array = emailList.split(', ');
Array.forEach(function(element, index, arr) {
arr[index] = element.replace("<<<<!!!!>>>> ", ", ");
});
at this point, I should have an array like this (no newlines, edited for legibility's sake):
Array[0] = '[email protected]'
Array[1] = 'lucky
<[email protected]>'
Array[2] = '"William Tell"
<[email protected]>'
Array[3] = '"John Rambo, III"
<[email protected]>'
Array[4] = '"there, might, be, several, commas inside the quotes
<[email protected]>'
Array[5] = '"yes, this is also a valid email address, can you believe"
<yes@this@[email protected]>'
Step 3: split email addresses
Now I have to turn each individual email into basic components (no newlines, edited for legibility's sake):
Array[0] = {fullName: '',
firstWord: '', localPart: 'peter', company: 'pan',
email: '[email protected]'}
Array[1] = {fullName: 'lucky',
firstWord: 'lucky', localPart: 'jack', company: 'pot',
email: '[email protected]'};
Array[2] = {fullName: 'William Tell',
firstWord: 'William', localPart: 'billy', company: 'tell',
email: '[email protected]'};
Array[3] = {fullName: 'John Rambo, III',
firstWord: 'John', localPart: 'johnny', company: 'rambo',
email: '[email protected]'};
Array[4] = {fullName: 'there, might, be, several, commas inside the quotes',
firstWord: 'there', localPart: 'multiple', company: 'commas',
email: '[email protected]'};
Array[5] = {fullName: 'yes, this is also a valid email address, can you believe',
firstWord: 'yes', localPart: 'yes@this@is', company: 'valid',
email: 'yes@this@[email protected]'};
To do that I'll use the following RegExps:
var firstWord = element.match('/"?(\w*),? .*"?/ig')[1];
this works!! :-) https://regex101.com/r/6Z481l/1
var fullName = element.match('/"?(.*)"? </ig')[1];
this DOESN'T work: captures trailing " :-( https://regex101.com/r/6Z481l/2
var localpart = element.match('/<(.*)@/ig')[1];
this DOESN'T work: peter in peter@pan is not captured :-( https://regex101.com/r/6Z481l/3
var company = element.match('/@(.*)\./ig')[1];
this works!! :-) https://regex101.com/r/6Z481l/4
var email = element.match('/<(.*@.*)>|(^[^<].*[^>])/ig')[1];
surprisingly, this works!! :-) But I'm almost certain it can be made more elegant https://regex101.com/r/6Z481l/5
Worth mentioning, emails are presumed to be validated
So, I need some help to complete steps 1 and 3. If any working regexp from step 3 can be simplified or made more elegant, I'll learn from that!
Not the goal, but if you come up with ONE magic RegExp that splits the email like I need it, then I can guarantee you are going to certainly wow me and make me feel very small for my lack of RegExp knowledge!!! :-)
Thanks!
Upvotes: 1
Views: 96
Reputation: 3609
I believe you should be able to get your expected end result using regex:
(?:(?:"?((\w+)\b.*\b)"?)\s)?<?(([\w@]*)@(\w*)\.[a-zA-Z]{2,3})>?,?
and replacing it with:
{ fullName:'\1', firstWord:'\2', localPart:'\4', company:'\5', email:'\3'}
Upvotes: 1
Reputation: 18980
You can split the string at a comma excluding those enclosed in quotes like this:
,(?=(?:[^'"]|'[^']*'|"[^"]*")*$)
This should allow you to get rid of step 1 & 2.
Regarding the non-functional patterns in step 3:
DOESN'T work: captures trailing "
(?|"(\[^"\]+)"|(.*) <)
: first match balanced quotes, alternatively everything before <
.DOESN'T work: peter in peter@pan is not captured
(<|^)(.*)@
: you could secondarily match from the start;For the email validation part you should use one of the existing and recommended solutions. But that's a another topic, I guess.
Upvotes: 1