Reputation: 4092
I am trying to optimize a function that I am going to do with a several rasters with millions of cells, so I want to parallelize this function.
So this is the initial raster:
library(raster)
SPA <- raster(nrows=3, ncols=3, xmn = -10, xmx = -4, ymn = 4, ymx = 10)
values(SPA) <- c(0.1, 0.4, 0.6, 0, 0.2, 0.4, 0, 0.1, 0.2)
plot(SPA)
The objective of the function is to get a dataframe with the distance between all of the cells present in the raster with a column from, a column to, and a column distance.
in order to do that I create a transition layer using the gdistance package:
library(gdistance)
h16 <- transition(SPA, transitionFunction=function(x){1},16,symm=FALSE)
h16 <- geoCorrection(h16, scl=FALSE)
and the origin points for every cell:
B <- xyFromCell(SPA, cell = 1:ncell(SPA))
head(B)
x y
[1,] -9 9
[2,] -7 9
[3,] -5 9
[4,] -9 7
[5,] -7 7
[6,] -5 7
With some help from some stackoverflow answers I made this function which is faster than the accCost one in gdistance
accCost2 <- function(x, fromCoords) {
fromCells <- cellFromXY(x, fromCoords)
tr <- transitionMatrix(x)
tr <- rBind(tr, rep(0, nrow(tr)))
tr <- cBind(tr, rep(0, nrow(tr)))
startNode <- nrow(tr)
adjP <- cbind(rep(startNode, times = length(fromCells)), fromCells)
tr[adjP] <- Inf
adjacencyGraph <- graph.adjacency(tr, mode = "directed", weighted = TRUE)
E(adjacencyGraph)$weight <- 1/E(adjacencyGraph)$weight
return(shortest.paths(adjacencyGraph, v = startNode, mode = "out")[-startNode])
}
And using apply I get my desired data.frame
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(apply(B,1, accCost2, x = h16)))
head(connections)
from to dist
1 1 1 0.0
2 1 2 219915.7
3 1 3 439831.3
4 1 4 221191.8
5 1 5 312305.7
6 1 6 493316.1
library("parallel")
cl = makeCluster(3)
clusterExport(cl, c("B", "h16", "accCost2"))
clusterEvalQ(cl, library(gdistance), library(raster))
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(parRapply(cl, B,1, accCost2, x = h16)))
stopCluster(cl)
But I get the following error:
Error in x[i, , drop = FALSE] : object of type 'S4' is not subsettable
I am fairly new in parallelization, and I am not sure what I am doing wrong
Upvotes: 1
Views: 285
Reputation: 203
There are several syntax issues in your code.
This code works for me.
library("parallel")
accCost_wrap <- function(x){accCost2(h16,x)}
#Instead of including h16 in the parRapply function,
#just get it in the node environment
cl = makeCluster(3)
clusterExport(cl, c("h16", "accCost2"))
#B will be "sent" to the nodes through the parRapply function.
clusterEvalQ(cl, {library(gdistance)})
#raster is a dependency of gdistance, so no need to include raster here.
pp <- parRapply(cl, x=B, FUN=accCost_wrap)
stopCluster(cl)
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),
to = rep(1:nrow(B), nrow(B)),
dist = as.vector(pp))
Your version of accCost is indeed faster than the version in gdistance. Your version omits the checks to see if your points are within the extent of your transition layer. Proceed with caution.
(You could make your function even faster by taking the cell numbers as input. Also, sending so much data back from each node does not seem very efficient.)
Upvotes: 3