Choc_waffles
Choc_waffles

Reputation: 537

Partial Match String and full replacement over multiple vectors

Would like to efficiently replace all partial match strings over a single column by supplying a vector of strings which will be searched (and matched) and also be used as replacement. i.e. for each vector in df below, it will partially match for vectors in vec_string. Where matches is found, it will simply replace the entire string with vec_string. i.e. turning 'subscriber manager' to 'manager'. By supplying more vectors into vec_string, it will search through the whole df until all is complete.

I have started the function, but can't seem to finish it off by replacing the vectors in df with vec_string. Appreciate your help

df <- c(
'solicitor'
,'subscriber manager'
,'licensed conveyancer'
,'paralegal'
,'property assistant'
,'secretary'
,'conveyancing paralegal'
,'licensee'
,'conveyancer'
,'principal'
,'assistant'
,'senior conveyancer'
,'law clerk'
,'lawyer'
,'legal practice director'
,'legal secretary'
,'personal assistant'
,'legal assistant'
,'conveyancing clerk')

vec_string <- c('manager','law')

#function to search and replace
replace_func <-
  function(vec,str_vec) {
    repl_str <- list()
    for(i in 1:length(str_vec)) {
      repl_str[[i]] <- grep(str_vec[i],unique(tolower(vec)))
    }
    names(repl_str) <- vec_string
    return(repl_str)
  }

replace_func(df,vec_string)

$`manager`
[1] 2

$law
[1] 13 14

As you can see, the function returns a named list with elements to which the replacement will

Upvotes: 0

Views: 40

Answers (1)

boski
boski

Reputation: 2467

This should do the trick

res = sapply(df,function(x){
  match = which(sapply(vec_string,function(y) grepl(y,x)))
  if (length(match)){x=vec_string[match[1]]}else{x}
})
res

 [1] "solicitor"               "manager"                 "licensed conveyancer"   
 [4] "paralegal"               "property assistant"      "secretary"              
 [7] "conveyancing paralegal"  "licensee"                "conveyancer"            
[10] "principal"               "assistant"               "senior conveyancer"     
[13] "law"                     "law"                     "legal practice director"
[16] "legal secretary"         "personal assistant"      "legal assistant"        
[19] "conveyancing clerk" 

We compare each part of df with each part of vec_string. If there is a match, the vec_string part is returned, else it is left as it is. Watch out as if there are more than 1 matches it will keep the first one.

Upvotes: 1

Related Questions