Remove one word if it appears in the string with other

Question

I have a list of strings like this:

string <- c("tasty apple", "tasty orange", "yellow banana", "red tasty peach", "tasty banana apple", "tasty apple yellow banana", "yellow orange banana", "peach tasty apple", "yellow banana tasty peach")

When there is just one type of fruit in the string it is fine. However, when there are more than 2 of them I have a list of patterns and replacements:

pattern <- c("banana apple", "banana orange", "peach apple", "banana peach")
replacement <- c("apple", "banana", "peach", "banana")

I can remove one of fruits when they are next to each other in the string, however in my data there can be words between them and I do not know how to remove word. The order of the words in the string might differ as well.

I want it to be like this:

Before	After
tasty apple	tasty apple
tasty orange	tasty orange
yellow banana	yellow banana
red tasty peach	red tasty peach
tasty banana apple	tasty apple
tasty apple yellow banana	tasty apple yellow
yellow orange banana	yellow banana
peach tasty apple	peach tasty
yellow banana tasty peach	yellow banana tasty

JKupzig · Accepted Answer

Here is a simple solution using a nested for-loop. The idea is to (1) reverse the replacement string, so it shows which word to delete and (2) then detect the case where the pattern is part of the string and (3) delete the word, defined in (1):

    reverse_replacement <- unlist(lapply(1:length(pattern), 
                                  function(x) {
                                    stringr::str_trim(stringr::str_remove(pattern[x], replacement[x]), "both") }))
index = 0
for (word_combi in string) {
  index <- index  + 1
  index_pattern <- 0
  
  for (pat in pattern) {
    index_pattern <- index_pattern + 1
    words_pattern <- stringr::str_split(pat, " ", n = Inf, simplify = FALSE)[[1]]
    words <- stringr::str_detect(word_combi, words_pattern)
    
    if (sum(words) == length(words_pattern)) {
      string[index] <- stringr::str_trim(stringr::str_remove(word_combi, reverse_replacement[index_pattern]), "both")
    }
  }
}

string
[1] "tasty apple"         "tasty orange"        "yellow banana"       "red tasty peach"    
[5] "tasty  apple"        "tasty apple yellow"  "yellow  banana"      "peach tasty"        
[9] "yellow banana tasty"

Remove one word if it appears in the string with other

Answers (2)

Related Questions