SophiaL
SophiaL

Reputation: 71

Pairwise raster comparison in R: alternative to for-loop?

How to efficiently compare pairs of distribution rasters (raster layers containing only 0 and 1)? I need to get a measure of the similarity among ~6500 individual global rasters. Istat from SDMTools should do the job.

Here is my code:

library(raster)
library(SDMTools)

Create reproducible example data: rasters with values of 0 and 1

# first raster
r1 <- raster(nrow=1800, ncol=3600, xmn=-18000000, xmx=18000000, ymn=-9000000, ymx=9000000, 
             crs="+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs", 
             resolution=10000, vals=0)
r2 <- raster(nrow=1800, ncol=3600, xmn=-18000000, xmx=0, ymn=0, ymx=9000000, 
             crs="+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs", 
             resolution=10000, vals=2)
r12 <- mosaic(r1, r2, fun=mean)

# second raster
r3 <- raster(nrow=1800, ncol=3600, xmn=-18000000, xmx=18000000, ymn=-9000000, ymx=9000000, 
             crs="+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs", 
             resolution=10000, vals=0)
r4 <- raster(nrow=1800, ncol=3600, xmn=-12000000, xmx=15000000, ymn=2000000, ymx=3000000, 
             crs="+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs", 
             resolution=10000, vals=2)
r34 <- mosaic(r3, r4, fun=mean)

List rasters

files_list <- list(r12, r34)

Create empty matrix to fill with data from loop

ras_comp <- matrix(NA, nrow=length(files_list), ncol=length(files_list))
ras_comp
# label rows and columns of matrix
rownames(ras_comp) <- c("r12", "r34")
colnames(ras_comp) <- c("r12", "r34")
ras_comp

Loop to compare all possible pairs of matrices/rasters

for (i in 1:length(files_list)) {
  # load raster i
  ras_i <- as.matrix(files_list[[i]])

  for (j in 1:length(files_list)) {
    # load raster j
    ras_j <- as.matrix(files_list[[j]])

    # compare both rasters
    ras_Istat <- Istat(ras_i, ras_j, old=F)

    # write value into matrix
    ras_comp[i,j] <- ras_Istat
  }
}

Check final matrix

ras_comp
> ras_comp
          r12       r34
r12 1.0000000 0.1814437
r34 0.1814437 1.0000000

Converting the rasters to matrices with as.matrix reduces the calculation time significantly and the resulting final table is what I need but doing this for thousands of rasters takes forever to complete. How can the code be optimized in order to compare the rasters in a more efficient way?

Upvotes: 1

Views: 577

Answers (1)

Spacedman
Spacedman

Reputation: 94237

Istat does a bunch of tests and scalings before a simple computation. If you know those tests pass you can do the scalings as a one off and work on the scaled values. It does:

if (length(which(dim(x) == dim(y))) != 2) 
    stop("matrix / raster objects must be of the same extent")
if (min(c(x, y), na.rm = T) < 0) 
    stop("all values must be positive")

Then it checks where both rasters are "finite", which includes NA values:

pos = which(is.finite(x) & is.finite(y))

It then computes the scaled values of the rasters:

px = x[pos]/sum(x[pos])
py = y[pos]/sum(y[pos])
H = sqrt(sum((sqrt(px) - sqrt(py))^2))

If old=FALSE like you have then it returns:

    return(1 - (H^2)/2)

> Istat(r12,r34)
[1] 0.1814437

If I cut out the tests and write a function that works on the scaled values I can get it down to:

fIstat = function(px,py){
    1 - (sum((sqrt(px) - sqrt(py))^2))/2
}

Test by scaling the rasters and running:

r12px = r12[]/sum(r12[])
r34px = r34[]/sum(r34[])
fIstat(r12px, r34px)
# [1] 0.1814437

Same value. Great, but is it quicker?

> microbenchmark(fIstat(r12px, r34px), Istat(r12,r34))
Unit: milliseconds
                 expr        min         lq       mean     median         uq
 fIstat(r12px, r34px)   49.95867   78.28649   78.10863   79.45235   80.85234
      Istat(r12, r34) 1084.84825 1181.31116 1217.64122 1212.93180 1263.50811
       max neval
  106.6803   100
 1349.0239   100

Yes, by a large factor.

So... if your data have no missing values or infinities, create a files_list of these scaled raster values, call my fIstat, only loop over the upper triangle, and you should speed this up by a factor of 10.

Upvotes: 1

Related Questions