IMTheNachoMan
IMTheNachoMan

Reputation: 5849

how to extract all email addresses from a user inputted list in Google Sheets with REGEXEXTRACT

I have a user input with a list of email addresses. They separator could be anything like a space, comma, semicolon, tab, new line, multiple separators, etc...

I'm trying to "sanitize" the input and product just a comma separated list of valid email addresses.

This is the regular expression I am using for a valid e-mail address: ([a-z0-9'_.-]+?@[a-z0-9_.-]+?\.[a-z]{2,3})

This is some sample input:

[email protected] [email protected], [email protected]       ;;;;**a23asdf =_ [email protected]
:
:;,
       [email protected]"

REGEXEXTRACT only extracts the first match.

enter image description here

How can I get it to extract all of the found matches?

I have seen online tricks saying to have two REGEXEXTRACT calls, one to replace the matches, then another to replace whats left. While that works, I am wondering if its possible to get REGEXEXTRACT to extract all matches. Like a global modifier for the regex.

Upvotes: 3

Views: 2914

Answers (3)

Max Makhrov
Max Makhrov

Reputation: 18727

Please try:

=REGEXEXTRACT(A1;"\Q"&REGEXREPLACE(A1;"\w\S*@\S+\w";"\\E(.*)\\Q")&"\E")

Reference:

https://github.com/google/re2/blob/main/doc/syntax.txt

Upvotes: 3

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60494

This seems to work:

=transpose(arrayformula(filter(REGEXEXTRACT(SPLIT(A1," "),"([a-z0-9'_.-]+?@[a-z0-9_.-]+?\.[a-z]{2,3})"),not(isna(REGEXEXTRACT(SPLIT(A1," "),"([a-z0-9'_.-]+?@[a-z0-9_.-]+?\.[a-z]{2,3})"))))))
  • Split the string into an array on spaces
  • Use your regex to validate each array member
  • Filter out the non-matching members
  • Use arrayformula to return all of the matching members
  • Transpose to return a vertical array

enter image description here

Upvotes: 1

JvdV
JvdV

Reputation: 76000

It's going to be relatively tough through REGEXEXTRACT since it won't support global modifier. But some alternative could be (still using your pattern):

=TEXTJOIN(", ",1,QUERY(TRANSPOSE(SPLIT(REGEXREPLACE(A1,"([a-z0-9'_.-]+?@[a-z0-9_.-]+?\.[a-z]{2,3})","|$1|"),"|")),"Select Col1 where Col1 like '%@%'"))

Or a little simpler:

=ARRAYFORMULA(TEXTJOIN(", ",1,IFERROR(REGEXEXTRACT(SPLIT(A1," "),"([a-z0-9'_.-]+?@[a-z0-9_.-]+?\.[a-z]{2,3})"),"")))

Just be aware that the REGEXREPLACE now uses | which you might need to change depending on what characters may or may not occur.

Upvotes: 4

Related Questions