Reputation: 676
I have a function that creates a cartogram of fish catch per country per year and puts that cartogram into a list of cartograms, depending on which year I feed it:
fishtogram <- function(year) {
dfname <- paste0("carto", year) # name of the cartogram being made
map_year <- get(paste0("map", year), map_years) # 'map_year' contains one SpatialPolygonsDataFrame of a year of fishing/country data pulled from the list of spdf's 'map_years'
carto_maps[[dfname]] <<- cartogram(map_year, "CATCH", itermax=1) # This is the part that takes forever. Create cartogram named 'dfname', chuck it into the carto_maps list
plot(carto_maps[[dfname]], main=dfname) # plot it
print(paste("Finished", dfname, "at", Sys.time())) # print time finished cartogram
writeOGR(obj = carto_maps[[dfname]], dsn = "Shapefiles", layer = dfname, driver = "ESRI Shapefile", overwrite_layer=TRUE) # Save cartogram as shapefile
}
Originally this was all in a for loop (for the years 1950-2014) and it does the job, just extremely slow. The part that is slowing me down is the cartogram
function. Currently, producing one cartogram takes about an hour and uses about ~13% of my CPU. I would like to try and use parallel processing to make 3-4 cartograms at once and hopefully speed things up.
How do I wrap this in an apply function correctly to both loop through the years I want and use parallel processing? I've been using this R bloggers post for guidance. My attempt:
lapply(seq(1975, 2014, 10), fishtogram, .parallel=TRUE)
>Error in FUN(X[[i]], ...) : unused argument (.parallel = TRUE)
Thank you to @patL in telling me to use lapply
vs apply
.
My code & data can be found here: https://github.com/popovs/400m-cartograms/blob/master/400m_cartograms.R
Upvotes: 1
Views: 906
Reputation: 2299
To go parallel you can try some parapply
family functions from parallel
library.
Following steps from this page you will need to firs detect the number of cores:
library(parallel)
no_cores <- detectCores() - 1 #it is recomendable that you use the number of cores less one
cl <- makeCluster(no_cores) #initiate cluster
It is important to export all functions and objects you will use during your parallelization:
clusterExport(cl, "fishtogram")
clusterExport(cl, "dfname")
clusterExport(cl, "map_years")
...
Then you can run your parallelized version of lapply
:
parLapply(cl, seq(1975, 2014, 10), fishtogram)
and finally stop the cluster
stopCluster(cl)
There are other functions that you can run your code in parallel (foreach
, from foreach
library; mclapply
, also from parallel
library, etc).
Upvotes: 1
Reputation: 1311
Your specific error is coming from your parenthesis on the fishtogram function. You dont need them when using apply:
apply(seq(1975, 2014, 10), 1, fishtogram)
..would fix that error.
Upvotes: 0