HunterThomas
HunterThomas

Reputation: 31

Is there an efficient way in R to search and replace words in multilple strings in a tibble?

I have a tibble and in this tibble I have a column named "description". There are about 380,000 descriptions here.

An example of a description:

"Abbreviations are very hlpful"

This is just an example to familiarize you with my data. All descriptions are different.

I also have a tibble with correctly spelled words. There are aproximentally 42,000 unique correctly spelled words.

My task is to replace all misspelled words in the descriptions with correctly spelled word. So the word "hlpful" would be replaced with "helpful".

My code is as follows:

countKeyWords <- 1
countDescriptions <- 1
amountKeyWords <- 42083
amountDescriptions <- 379571
while (countKeyWords < amountKeyWords){
  while (countDescriptions < amountDescriptions){
    semiFormatTet$description[countDescriptions] <-
      gsub(keyWords$SearchFor[countKeyWords], keyWords$Map[countKeyWords], semiFormatTet$description[countDescriptions], ignore.case = TRUE)
    countDescriptions = countDescriptions + 1
  }
  countDescriptions = 0
  countKeyWords = countKeyWords + 1
}

Note:

As is, the loop would execute close to 16,000,000,000 times. That is very inefficient, how would I make this loop more efficient so I do not have to wait a month for it to finish?

Upvotes: 0

Views: 47

Answers (1)

Mike V
Mike V

Reputation: 1364

If I am not wrong, is this the one you are looking for?

Library(stringr)
Library(Tidyverse)
Library(dplyr)
df <- data.frame(DESCRIPTION = c("This is the first description with hlpful", 
                              "This is the second description with hlpful", 
                              "This is the third description with hlpful", 
                              "This is the fourth description with hlpful", 
                              "This is the fifth description with hlpful", 
                              "This is the sixth description with hlpful",
                              "This is the seventh description with hlpful",
                              "This is the eighth description with hlpful",
                              "This is the ninth description with hlpful"))

df$DESCRIPTION <- str_replace_all(df$DESCRIPTION,"hlpful", "helpful")

      DESCRIPTION
1   This is the first description with helpful
2  This is the second description with helpful
3   This is the third description with helpful
4  This is the fourth description with helpful
5   This is the fifth description with helpful
6   This is the sixth description with helpful
7 This is the seventh description with helpful
8  This is the eighth description with helpful
9   This is the ninth description with helpful

Upvotes: 1

Related Questions