Sean Basquill
Sean Basquill

Reputation: 1

Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-based raster

I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of predicted dissimilarities, established from a generalized dissimilarity model (GDM). The primary data are GDM transformed predictor rasters (outputs of GDM model fitting), assembled in a single multi-layer raster object. Spatial analyses involve using values from the transformed rasters to predict dissimilarities among pairs of grid cells and applying those outputs for answering various spatial questions. The project raster is large (requires 102 gigs to process; via mem_info).

My specific question: what terra-based code is suitable for applying GDM predictions to assess spatial patterns of similarity and uniqueness (i.e., mean similarity between each grid cell and a random sample of locations across the entire study region). For outputs, I am looking for 1) a raster of predicted similarity and 2) a raster of predicted uniqueness.

This question is quite comparable to Employing `terra::` to avoid std::bad_alloc error when extracting values from large SpatRaster stack. I have sought to use relevant components of that helpful answer here, but haven't successfully developed a workable solution.


I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.


Create multilayer sample raster

library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
          nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA

Initial (partial) attempt at new workflow.

Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.

# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5

# Return matrix of sampled values  
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE)) 

# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)

# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`

# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)

for(i in 1:(n.sub-1)) {
    for(j in (i+1):n.sub) {
        ecol.dist <- sum(abs(x[i, ] - x[j, ]))
        sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
    }
}

At this point, I surmise two functions are needed for calculating: 1) dis/similarity and 2) uniqueness (dis/similarity to the region as a whole) and applying those predictions across the study extent.


UPDATE: following @robert-hijmans helpful solution, I was able to produce a raster of predicted uniqueness across my study extent. The code scaled very well with my project data.

I get an Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'deepcopy': subscript out of bounds when I run the first code chunk (re: similarity across 10 cells) with my project data. The code chunk works with sample data, but the output is a multi-layer raster (was looking for single layer of dis/similarity).

Upvotes: 0

Views: 114

Answers (1)

Robert Hijmans
Robert Hijmans

Reputation: 47481

You ask for "pairwise-similarities". If you mean pairs of cells, those are in 1-sub.diss.m, for the sample. Do you actually want a raster for that (with ncell(r) layers)?

For the first 10 cells:

dfun <- function(cell) {
  if (is.na(r[cell][1])) return(NULL)
  edist <- sum(r - unlist(r[cell]))
  out <- 1 - exp(-1 * (gdmRastMod$intercept + edist))
  # with large datasets 
  # out <- writeRaster(out, paste0(tempfile(), ".tif"))
  out
}
d <- lapply(1:10, dfun)
names(d) <- 1:10

dr <- rast(d[!sapply(d, is.null)])
dr
#class       : SpatRaster 
#dimensions  : 100, 100, 9  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#names       :           1,         2,          3,         4,          5,          6, ... 
#min values  : -0.03850853, 0.1265724, -0.8860728, 0.4260013, -0.5692148, -0.5711362, ... 
#max values  :  0.98196854, 0.9848348,  0.9672524, 0.9900337,  0.9727540,  0.9727206, ... 

You also ask for "similarity to the region as a whole", but you do not define what you mean by that (what measure to use). But you could do something like this:

m <- global(r, "mean", na.rm=T)
edist <- sum(r - unlist(m))
dissim <- 1 - exp(-1 * (gdmRastMod$intercept + edist))

dissim
#class       : SpatRaster 
#dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#name        :        sum 
#min value   : 0.02861172 
#max value   : 0.98313393 

Upvotes: 0

Related Questions