Maddie
Maddie

Reputation: 87

R partial String Match - exclude

I have a list of emails that basically I want to clean. I want to state that if the '@' character is not in the specific email I want to remove that email - that way an input like 'mywebsite.com' will be removed.

My code is as follows:

  email_clean <- function(email, invalid = NA){
    email <- trimws(email)                                                          # Removes whitespace
    email[(nchar(email) %in% c(1,2)) ] <- invalid                                   # Removes emails with 1 or 2 character length
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",           # List of bad emails - modify to the 
                   "\\@noemail.com", "\\@test.com",                                 # specifications of the request

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")          # Deletes names matching bad email
    email <-gsub(pattern, invalid, sapply(email,as.character))
    unname(email)
  }

  ## Define vector of SSN from origianl csv column
  Cleaned_Email <- email_clean(my_data$Email)


  ## Binds cleaned phone to csv
  my_data<-cbind(my_data,Cleaned_Email)

Thanks!!

Upvotes: 0

Views: 998

Answers (2)

Pierre L
Pierre L

Reputation: 28441

  email_clean <- function(email, invalid = NA){
    email <- trimws(email)                                                          # Removes whitespace
    email[(nchar(email) %in% c(1,2)) ] <- invalid                                   # Removes emails with 1 or 2 character length
    email[!grepl("@", email)] <- invalid  # <------------------ New line added here ------------
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",           # List of bad emails - modify to the 
                   "\\@noemail.com", "\\@test.com",                                 # specifications of the request

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")          # Deletes names matching bad email
    email <-gsub(pattern, invalid, sapply(email,as.character))
    unname(email)
  }

Upvotes: 3

Gopala
Gopala

Reputation: 10483

Try this to exclude any rows in my_data that don't have '@' sign in the Email column:

my_data <- my_data[grep('@', my_data$Email), ]

Upvotes: 0

Related Questions