spops
spops

Reputation: 676

How to correctly wrap R function with apply() to implement parallel processing?

I have a function that creates a cartogram of fish catch per country per year and puts that cartogram into a list of cartograms, depending on which year I feed it:

fishtogram <- function(year) {
  dfname <- paste0("carto", year) # name of the cartogram being made
  map_year <- get(paste0("map", year), map_years) # 'map_year' contains one SpatialPolygonsDataFrame of a year of fishing/country data pulled from the list of spdf's 'map_years'
  carto_maps[[dfname]] <<- cartogram(map_year, "CATCH", itermax=1) # This is the part that takes forever. Create cartogram named 'dfname', chuck it into the carto_maps list
  plot(carto_maps[[dfname]], main=dfname) # plot it 
  print(paste("Finished", dfname, "at", Sys.time())) # print time finished cartogram
  writeOGR(obj = carto_maps[[dfname]], dsn = "Shapefiles", layer = dfname, driver = "ESRI Shapefile", overwrite_layer=TRUE) # Save cartogram as shapefile
}

Originally this was all in a for loop (for the years 1950-2014) and it does the job, just extremely slow. The part that is slowing me down is the cartogram function. Currently, producing one cartogram takes about an hour and uses about ~13% of my CPU. I would like to try and use parallel processing to make 3-4 cartograms at once and hopefully speed things up.

How do I wrap this in an apply function correctly to both loop through the years I want and use parallel processing? I've been using this R bloggers post for guidance. My attempt:

lapply(seq(1975, 2014, 10), fishtogram, .parallel=TRUE)

 >Error in FUN(X[[i]], ...) : unused argument (.parallel = TRUE)

Thank you to @patL in telling me to use lapply vs apply.

My code & data can be found here: https://github.com/popovs/400m-cartograms/blob/master/400m_cartograms.R

Upvotes: 1

Views: 906

Answers (2)

patL
patL

Reputation: 2299

To go parallel you can try some parapply family functions from parallel library.


Following steps from this page you will need to firs detect the number of cores:

library(parallel)

no_cores <- detectCores() - 1 #it is recomendable that you use the number of cores less one

cl <- makeCluster(no_cores) #initiate cluster

It is important to export all functions and objects you will use during your parallelization:

clusterExport(cl, "fishtogram")
clusterExport(cl, "dfname")
clusterExport(cl, "map_years")
...

Then you can run your parallelized version of lapply:

parLapply(cl, seq(1975, 2014, 10), fishtogram)

and finally stop the cluster

stopCluster(cl)

There are other functions that you can run your code in parallel (foreach, from foreach library; mclapply, also from parallel library, etc).

Upvotes: 1

C-x C-c
C-x C-c

Reputation: 1311

Your specific error is coming from your parenthesis on the fishtogram function. You dont need them when using apply:

apply(seq(1975, 2014, 10), 1, fishtogram)

..would fix that error.

Upvotes: 0

Related Questions