Reputation: 8247
I have following pattern in my column
[email protected]
[email protected]
Now, I want to extract text after @
and before .
i.e gmail and hotmail .I am able to extract text after .
with following code.
sub(".*@", "", email)
How can I modify above to fit in my use case?
Upvotes: 5
Views: 5851
Reputation: 349
This is @hrbrmstr's function with stringr:
stringr::str_locate_all(email,"@") %>% purrr::map_int(~ .[2]) %>%
purrr::map2_df(email, ~ {
stringr::str_sub(.y, .x+1, nchar(.y)) %>%
urltools::suffix_extract()
})
Upvotes: 1
Reputation: 78832
You:
@
can appear in multiple places)[email protected]
", "[email protected]
" (i.e. naively assuming only a domain could come back to bite you at some point in this analysis)So — unless you know for sure that you have and always will have simple email addresses — might I suggest:
library(stringi)
library(urltools)
library(dplyr)
library(purrr)
emails <- c("[email protected]", "[email protected]",
"[email protected]",
"[email protected]",
"[email protected]")
stri_locate_last_fixed(emails, "@")[,"end"] %>%
map2_df(emails, function(x, y) {
substr(y, x+1, nchar(y)) %>%
suffix_extract()
})
## host subdomain domain suffix
## 1 gmail.com <NA> gmail com
## 2 hotmail.com <NA> hotmail com
## 3 deparment.example.com department example com
## 4 yet.another.department.com yet.another department com
## 5 froodyco.co.uk <NA> froodyorg co.uk
Note the proper splitting of subdomain, domain & suffix, especially for the last one.
Knowing this, we can then change the code to:
stri_locate_last_fixed(emails, "@")[,"end"] %>%
map2_chr(emails, function(x, y) {
substr(y, x+1, nchar(y)) %>%
suffix_extract() %>%
mutate(full_domain=ifelse(is.na(subdomain), domain, sprintf("%s.%s", subdomain, domain))) %>%
select(full_domain) %>%
flatten_chr()
})
## [1] "gmail" "hotmail"
## [3] "department.example" "yet.another.department"
## [5] "froodyorg"
Upvotes: 8
Reputation: 43189
You can use:
emails <- c("[email protected]", "[email protected]")
emails_new <- gsub("@(.+)$", "\\1", emails)
emails_new
# [1] "gmail.com" "hotmail.com"
See a demo on ideone.com.
Upvotes: 3
Reputation: 887741
We can use gsub
gsub(".*@|\\..*", "", email)
#[1] "gmail" "hotmail"
Upvotes: 5