toomey8
toomey8

Reputation: 333

Regular Expression in Base R Regex to identify email address

I am trying to use the stringr library to extract emails from a big, messy file.

str_match doesn't allow perl=TRUE, and I can't figure out the escape characters to get it to work.

Can someone recommend a relatively robust regex that would work in the context below?

c("[email protected]", "[email protected]", "[email protected]")->emails
"SomeRegex"->regex
str_match(emails, regex)

Upvotes: 7

Views: 8988

Answers (3)

zerocool
zerocool

Reputation: 369

Actually, I'd recommend a longer regex, since the solutions above allow for an email like [email protected]. with a trailing dot.

isMail <- function(x){
   grepl("^[[:alnum:]._-]+@[[:alnum:].-]+$", x))
}

Upvotes: 0

Ken Taylor
Ken Taylor

Reputation: 170

I found this regex worked better for me:

^[[:alnum:]._-]+@[[:alnum:].-]+$

Dash does have a special meaning in a character class unless it is the last character. It is a range operator, as in "A-Z"

Upvotes: 4

IRTFM
IRTFM

Reputation: 263461

> "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex
> str_match(emails, regex)
     [,1]                   
[1,] "[email protected]"      
[2,] "[email protected]"
[3,] "[email protected]"

The @-sign is not in need of escaping in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com",".co", ".edu", ".org" then you should specify how complete that list needs to be.

As pointed out by M42, this is not a surefire method. In fact it is claimed that there is no sure-fire method: Using a regular expression to validate an email address

Upvotes: 10

Related Questions