Parallel processing in R with mclapply: function does not work

Question

I have quite a big set of keywords that i need to compare to an even bigger corpus of documents and count the number of occurrences.

Since the calculation takes hours, I decided to try parallel processing. On this forum, I found the mclapply function of the parallel package, which seems to be helpful.

Being very new to R, I could not get the code working (see below for a short version). More specifically, I got the error:

"Error in get(as.character(FUN), mode = "function", envir = envir) : object 'FUN' of mode 'function' was not found"

rm(list=ls())

df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km")
keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159")

countstrings <- function(x){str_count(x, paste(sprintf("\b%s\b", keywords), collapse = '|'))}

# Normal way with one processor
number_of_keywords <- countstrings(df)
# Result: [1] 3 2 2

# Attempt at parallel processing
library(stringr)
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
number_of_keywords <- mclapply(cl, countstrings(df))
stopCluster(cl)
#Error in get(as.character(FUN), mode = "function", envir = envir) : 
#object 'FUN' of mode 'function' was not found

Any help is apprechiated!

YOLO · Accepted Answer

This function should be faster. Here's an alternate way of using parallel processing using parSapply (this returns a vector instead of list):

# function to count
count_strings <- function(x, words)
{
    sum(unlist(strsplit(x, ' ')) %in% words)
}

library(stringr)
library(parallel)
mcluster <- makecluster(detectCores()) # using all cores

number_of_keywords <- parSapply(mcluster, df, count_strings, keywords, USE.NAMES=F)

[1] 3 2 2

Parallel processing in R with mclapply: function does not work

Answers (1)

Related Questions