Reputation: 439
I'm trying to create a foreach
to fasten the misspelling words replacement for a bigger data frame. My code ran with no issue but I'm not seeing the correct result. Please see below for an example of my data frame and the codes I used.
I have a main data frame and a data frame where I use to find and replace the predefined misspelled text from the main data frame:
#create main data frame
df <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"), stringsAsFactors = FALSE)
#create predefined misspelled data frame
df_r <- data.frame("misspelled" = c("tonigh", "Wuld", "trave", "muh", "undestand", "shaes", "Learnin"),
"correction" = c("tonight", "Would", "travel", "much", "understand", "shapes", "Learning"))
library(DataCombine)
library(doParallel)
library(foreach)
no_cores <- detectCores()
cl <- makeCluster(no_cores[1]-1)
registerDoParallel(cl)
df_replacement <- foreach((df$Text), .combine = cbind) %dopar% {
replacement = DataCombine::FindReplace(data = df, Var = "Text", replaceData = df_r,
from = "misspelled", to = "correction", exact = FALSE)
replacement
}
stopCluster(cl)
I'm not sure what did I do wrong in the foreach
part. Any advise is appreciated.
Upvotes: 1
Views: 255
Reputation: 2956
I think you are looking for this:
df_replacement <- foreach(i = (rownames(df)), .combine = rbind) %dopar% {
replacement = DataCombine::FindReplace(data = df[i,], Var = "Text", replaceData = df_r,
from = "misspelled", to = "correction", exact = FALSE)
replacement
}
whats happening:
Foreach understands it has to run i rows long. But your function calls always the whole! dataframe. So the output is also the whole dataframe, which is two columns long for every loop. The .combine=cbind
combines the dataframes by columns.... 2(columns)*7(cores) = 14. So make sure your FindReplace just calls the rows you want to have and not the whole dataframe in each loop.
I edited this with calling just the rows of each iteration df[i,]
in your FindReplace
. Also i changed cbind
to rbind
, since you want to add the rows and not the columns afterwards.
Upvotes: 1