Reputation: 2763
I have measurements of maximum and minimum temperature and precipitation that are organized as arrays of size (100x96x50769), where i and j are grid cells with coordinates associated and z means the number of measurements over time.
Conceptually, it looks like this:
I am using the climdex.pcic
package to calculate indices of extreme weather events. Given a time series of maximum and minimum temperature and precipitation, the climdexInput.raw
function will return a climdexIput
object that can be used to determine several indices: number of frost days, number of summer days, consecutive dry days etc.
The call for the function is pretty simple:
ci <- climdexInput.raw(tmax=x, tmin=y, prec=z,
t, t, t, base.range=c(1961,1990))
where x is a vector of maximum temperatures, y is a vector of minimum temperatures, z is a vector of precipitation and t is a vector with dates under which x, y and z were measured.
What I would like to do is to extract the timeseries for each element of my array (i.e. each grid cell in the figure above) and use it to run the climdexInput.raw
function.
Because of the large number of elements of real data, I want to run this task in parallel on my 4-core Linux server. However, I have no experience with parallelization in R.
Here's one example of my program (with intentionally reduced dimensions to make execution faster on your computer):
library(climdex.pcic)
# Create some dates
t <- seq(as.Date('2000-01-01'), as.Date('2010-12-31'), 'day')
# Parse the dates into PCICt
t <- as.PCICt(strftime(t), cal='gregorian')
# Create some dummy weather data, with dimensions `# of lat`, `# of lon` and `# of timesteps`
nc.min <- array(runif(10*9*4018, min=0, max=15), c(10, 9, 4018))
nc.max <- array(runif(10*9*4018, min=25, max=40), c(10, 9, 4018))
nc.prc <- array(runif(10*9*4018, min=0, max=25), c(10, 9, 4018))
# Create "ci" object
ci <- climdexInput.raw(tmax=nc.max[1,1,], tmin=nc.min[1,1,], prec=nc.prc[1,1,],
t, t, t, base.range=c(2000,2005))
# Once you have “ci”, you can compute any of the indices provided by the climdex.pcic package.
# The example below is for cumulative # of dry days per year:
cdd <- climdex.cdd(ci, spells.can.span.years = TRUE)
Now, please note that in the example above I used only the first element of my array ([1,1,]) as an example in the climdexInput.raw
function.
How can do the same for all elements taking advantage of parallel processing, possibly by looping over the dimensions i
and j
of my array?
Upvotes: 2
Views: 732
Reputation: 11728
You can use foreach to do that:
library(doParallel)
registerDoParallel(cl <- makeCluster(3))
res <- foreach(j = seq_len(ncol(nc.min))) %:%
foreach(i = seq_len(nrow(nc.min))) %dopar% {
ci <- climdex.pcic::climdexInput.raw(
tmax=nc.max[i,j,],
tmin=nc.min[i,j,],
prec=nc.prc[i,j,],
t, t, t,
base.range=c(2000,2005)
)
}
stopCluster(cl)
See my guide on parallelism using foreach: https://privefl.github.io/blog/a-guide-to-parallelism-in-r/.
Then, to compute an index, just use climdex.cdd(res[[1]][[1]], spells.can.span.years = TRUE)
(j
first, i
second).
Upvotes: 1