Mr. Biggums
Mr. Biggums

Reputation: 207

Extracting a certain substring (email address)

I'm attempting to pull some a certain from a variable that looks like this:

v1 <- c("Persons Name <[email protected]>","person 2 <[email protected]>")

(this variable has hundreds of observations)

I want to eventually make a second variable that pulls their email to give this output:

v2 <- c("[email protected]", "[email protected]")

How would I do this? Is there a certain package I can use? Or do I need to make a function incorporating grep and substr?

Upvotes: 7

Views: 328

Answers (4)

d.b
d.b

Reputation: 32558

You can look for a pattern that looks like email using regexpr. If a match is found, extract the relevant part using substring. The starting position and match length is provided by the regexpr

inds = regexpr(pattern = "<(.*@.*\\..*)>", v1)
ifelse(inds > 1,
       substring(v1, inds + 1, inds + attr(inds, "match.length") - 2),
       NA)
#[1] "[email protected]" "[email protected]"

Upvotes: 2

IceCreamToucan
IceCreamToucan

Reputation: 28705

You can look for the pattern "anything**, then <, then (anything), then >, then anything" and replace that pattern with the part between the parentheses, indicated by \1 (and an extra \ to escape).

sub('.*<(.*)>.*', '\\1', v1)
# [1] "[email protected]" "[email protected]" 

** "anything" actually means anything but line breaks

Upvotes: 3

MrFlick
MrFlick

Reputation: 206616

Those look like what R might call a "person". There is an as.person() function that can split out the email address. For example

v1 <- c("Persons Name <[email protected]>","person 2 <[email protected]>")
unlist(as.person(v1)$email)
# [1] "[email protected]" "[email protected]"

For more information, see the ?person help page.

Upvotes: 18

akrun
akrun

Reputation: 887991

One option with str_extract from stringr

library(stringr)
str_extract(v1, "(?<=\\<)[^>]+")
#[1] "[email protected]" "[email protected]"  

Upvotes: 3

Related Questions