cheklapkok
cheklapkok

Reputation: 439

foreach and doparallel ran with no issue in R but did not get any correct result

I'm trying to create a foreach to fasten the misspelling words replacement for a bigger data frame. My code ran with no issue but I'm not seeing the correct result. Please see below for an example of my data frame and the codes I used.

I have a main data frame and a data frame where I use to find and replace the predefined misspelled text from the main data frame:

#create main data frame
df <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
                                            "Wuld you like to trave with me?",
                                            "There is so muh to undestand.",
                                            "Sentences cone in many shaes and sizes.",
                                            "Learnin R is fun",
                                            "yesterday was Friday",
                                            "bing search engine"), stringsAsFactors = FALSE)

#create predefined misspelled data frame
df_r <- data.frame("misspelled" = c("tonigh", "Wuld", "trave", "muh", "undestand", "shaes", "Learnin"), 
                   "correction" = c("tonight", "Would", "travel", "much", "understand", "shapes", "Learning"))

library(DataCombine)
library(doParallel)
library(foreach)
no_cores <- detectCores()
cl <- makeCluster(no_cores[1]-1)
registerDoParallel(cl)

df_replacement <- foreach((df$Text), .combine = cbind) %dopar% {
  replacement = DataCombine::FindReplace(data = df, Var = "Text", replaceData = df_r,
                                             from = "misspelled", to = "correction", exact = FALSE)

  replacement
}
stopCluster(cl)

I'm not sure what did I do wrong in the foreach part. Any advise is appreciated.

Upvotes: 1

Views: 255

Answers (1)

mischva11
mischva11

Reputation: 2956

I think you are looking for this:

df_replacement <- foreach(i = (rownames(df)), .combine = rbind) %dopar% {
  replacement = DataCombine::FindReplace(data = df[i,], Var = "Text", replaceData = df_r,
                                         from = "misspelled", to = "correction", exact = FALSE)

  replacement
}

whats happening:

Foreach understands it has to run i rows long. But your function calls always the whole! dataframe. So the output is also the whole dataframe, which is two columns long for every loop. The .combine=cbind combines the dataframes by columns.... 2(columns)*7(cores) = 14. So make sure your FindReplace just calls the rows you want to have and not the whole dataframe in each loop.

I edited this with calling just the rows of each iteration df[i,] in your FindReplace. Also i changed cbind to rbind, since you want to add the rows and not the columns afterwards.

Upvotes: 1

Related Questions