Reputation: 8458
I have a character vector (myVector) which contains several instances of email addresses scattered through a long string of semi-cleaned HTML stored in a single entry in the vector.
I know the relevant domain name ("@domain.com") and I want to extract each email address associated with that domain name (e.g. "[email protected]") preceded by white space.
I have tried the following code, but it doesn't deliver the right substring indices:
gregexpr("\\s [email protected]", myVector)
Any thoughts on (a) how I can fix the regular expression, and (b) whether there is a more elegant solution?
Upvotes: 1
Views: 787
Reputation: 50753
You want space followed by no-spaces so gregexpr("\\s\\[email protected]", myVector)
should be fine (but it counts extra space on start).
As an alternative solution take look at stringr
package:
library(stringr)
str_extract_all(myVector, "\\s\\[email protected]")
Or use str_extract_all(myVector, "\\[email protected]")
which returns also adressed at the start of the string (and without extra space).
Examples:
myVector <- "[email protected] and [email protected] and [email protected]. What about:[email protected] and [email protected]"
gregexpr("\\s\\[email protected]", myVector)
# [[1]]
# [1] 19 38 61 87
# attr(,"match.length")
# [1] 15 17 22 16
# attr(,"useBytes")
# [1] TRUE
str_extract_all(myVector, "\\s\\[email protected]")
# [1] " [email protected]" " [email protected]" " about:[email protected]"
# [4] " [email protected]"
str_extract_all(myVector, "\\[email protected]")
# [1] "[email protected]" "[email protected]" "[email protected]"
# [4] "about:[email protected]" "[email protected]"
(about:four
is some corner case to think about)
Upvotes: 1
Reputation: 16277
Using grep
and value = TRUE
:
str1 <-"Long text with email addresses [email protected] and [email protected] throughout [email protected]"
str1 <-unlist(strsplit(str1, " ")) #split on spaces
grep("@domain.com", str1, value = TRUE)
#[1] "[email protected]" "[email protected]"
Upvotes: 1
Reputation: 4109
I tried to replicate your question with a small example by creating a single string that has a few emails included in it.
> foo = "[email protected] some filler text to use an [email protected] example for this
[email protected] question [email protected] that OP has has asked"
> strsplit(foo, " ")
[[1]]
[1] "[email protected]" "some" "filler"
[4] "text" "to" "use"
[7] "an" "[email protected]" "example"
[10] "for" "this\[email protected]" "question"
[13] "[email protected]" "that" "OP"
[16] "has" "has" "asked"
> strsplit(foo, " ")[[1]][grep("@gmail.com", strsplit(foo, " ")[[1]])]
[1] "[email protected]" "[email protected]" "this\[email protected]"
[4] "[email protected]"
Upvotes: 1