Reputation: 207
I'm attempting to pull some a certain from a variable that looks like this:
v1 <- c("Persons Name <[email protected]>","person 2 <[email protected]>")
(this variable has hundreds of observations)
I want to eventually make a second variable that pulls their email to give this output:
v2 <- c("[email protected]", "[email protected]")
How would I do this? Is there a certain package I can use? Or do I need to make a function incorporating grep
and substr
?
Upvotes: 7
Views: 328
Reputation: 32558
You can look for a pattern that looks like email using regexpr
. If a match is found, extract the relevant part using substring
. The starting position and match length is provided by the regexpr
inds = regexpr(pattern = "<(.*@.*\\..*)>", v1)
ifelse(inds > 1,
substring(v1, inds + 1, inds + attr(inds, "match.length") - 2),
NA)
#[1] "[email protected]" "[email protected]"
Upvotes: 2
Reputation: 28705
You can look for the pattern "anything**, then <, then (anything), then >, then anything" and replace that pattern with the part between the parentheses, indicated by \1 (and an extra \ to escape).
sub('.*<(.*)>.*', '\\1', v1)
# [1] "[email protected]" "[email protected]"
** "anything" actually means anything but line breaks
Upvotes: 3
Reputation: 206616
Those look like what R might call a "person". There is an as.person()
function that can split out the email address. For example
v1 <- c("Persons Name <[email protected]>","person 2 <[email protected]>")
unlist(as.person(v1)$email)
# [1] "[email protected]" "[email protected]"
For more information, see the ?person
help page.
Upvotes: 18
Reputation: 887991
One option with str_extract
from stringr
library(stringr)
str_extract(v1, "(?<=\\<)[^>]+")
#[1] "[email protected]" "[email protected]"
Upvotes: 3