Jay Frias
Jay Frias

Reputation: 53

RegExp for splitting a list of emails into basic components (JavaScript)

I almost have this working, but not quite.

I have a JavaScript string that contains a list of emails each formatted differently (no newlines, edited for legibility's sake):

var emailList = '[email protected], 
lucky <[email protected]>, 
"William Tell" <[email protected]>, 
"John Rambo, III" <[email protected]>, 
"there, might, be, several, commas inside the quotes" <[email protected]>, 
"yes, this is also a valid email address, can you believe" <yes@this@[email protected]>'

Firstly, I need to split this string into the different emails. Emails are separated by ', ':

[email protected], lucky <[email protected]>

but ', ' also might occur in names enclosed by quotes:

"John Rambo, III" <[email protected]>

Even multiple commas can be found inside quotes:

"there, might, be, several, commas inside the quotes" <[email protected]>

Step 1: replace , enclosed in quotes

I'd like to substitute the commas for something like <<<<!!!!>>>>

I've tried this regexp: (".*)(,)(\s.*"), $1<<<<!!!!>>>>$3 https://regex101.com/r/baha69/1/ but it's NOT replacing commas within quotes... :-(

Step 2: split array and undo comma substitution

This can be easily done now in JavaScript with split and replace:

var Array = emailList.split(', ');
Array.forEach(function(element, index, arr) {
  arr[index] = element.replace("<<<<!!!!>>>> ", ", ");
});

at this point, I should have an array like this (no newlines, edited for legibility's sake):

Array[0] = '[email protected]'
Array[1] = 'lucky
            <[email protected]>'
Array[2] = '"William Tell"
            <[email protected]>'
Array[3] = '"John Rambo, III"
            <[email protected]>'
Array[4] = '"there, might, be, several, commas inside the quotes
            <[email protected]>'
Array[5] = '"yes, this is also a valid email address, can you believe"
            <yes@this@[email protected]>'

Step 3: split email addresses

Now I have to turn each individual email into basic components (no newlines, edited for legibility's sake):

Array[0] = {fullName: '',
            firstWord: '', localPart: 'peter', company: 'pan', 
            email: '[email protected]'}
Array[1] = {fullName: 'lucky',
            firstWord: 'lucky', localPart: 'jack', company: 'pot', 
            email: '[email protected]'};
Array[2] = {fullName: 'William Tell',
            firstWord: 'William', localPart: 'billy', company: 'tell',
            email: '[email protected]'};
Array[3] = {fullName: 'John Rambo, III',
            firstWord: 'John', localPart: 'johnny', company: 'rambo',
            email: '[email protected]'};
Array[4] = {fullName: 'there, might, be, several, commas inside the quotes', 
            firstWord: 'there', localPart: 'multiple', company: 'commas',
            email: '[email protected]'};
Array[5] = {fullName: 'yes, this is also a valid email address, can you believe', 
            firstWord: 'yes', localPart: 'yes@this@is', company: 'valid',
            email: 'yes@this@[email protected]'};

To do that I'll use the following RegExps:

var firstWord = element.match('/"?(\w*),? .*"?/ig')[1]; 

this works!! :-) https://regex101.com/r/6Z481l/1

var fullName = element.match('/"?(.*)"? </ig')[1]; 

this DOESN'T work: captures trailing " :-( https://regex101.com/r/6Z481l/2

var localpart = element.match('/<(.*)@/ig')[1];

this DOESN'T work: peter in peter@pan is not captured :-( https://regex101.com/r/6Z481l/3

var company = element.match('/@(.*)\./ig')[1];

this works!! :-) https://regex101.com/r/6Z481l/4

var email = element.match('/<(.*@.*)>|(^[^<].*[^>])/ig')[1];

surprisingly, this works!! :-) But I'm almost certain it can be made more elegant https://regex101.com/r/6Z481l/5

Worth mentioning, emails are presumed to be validated

So, I need some help to complete steps 1 and 3. If any working regexp from step 3 can be simplified or made more elegant, I'll learn from that!

Not the goal, but if you come up with ONE magic RegExp that splits the email like I need it, then I can guarantee you are going to certainly wow me and make me feel very small for my lack of RegExp knowledge!!! :-)

Thanks!

Upvotes: 1

Views: 96

Answers (2)

Matt.G
Matt.G

Reputation: 3609

I believe you should be able to get your expected end result using regex:

(?:(?:"?((\w+)\b.*\b)"?)\s)?<?(([\w@]*)@(\w*)\.[a-zA-Z]{2,3})>?,?

and replacing it with:

{ fullName:'\1', firstWord:'\2', localPart:'\4', company:'\5', email:'\3'}

See Demo

Upvotes: 1

wp78de
wp78de

Reputation: 18980

You can split the string at a comma excluding those enclosed in quotes like this:

,(?=(?:[^'"]|'[^']*'|"[^"]*")*$)

This should allow you to get rid of step 1 & 2.

Regarding the non-functional patterns in step 3:

DOESN'T work: captures trailing "

  • (?|"(\[^"\]+)"|(.*) <): first match balanced quotes, alternatively everything before <.
    Caveat: you have to check group 2 if group 1 is empty (unfortunately, JS has no branch reset group).

DOESN'T work: peter in peter@pan is not captured

  • (<|^)(.*)@: you could secondarily match from the start;
    however, this is troublesome since the pattern is not properly anchored.

For the email validation part you should use one of the existing and recommended solutions. But that's a another topic, I guess.

Upvotes: 1

Related Questions