TheRealJimShady
TheRealJimShady

Reputation: 917

Calculate stats on a group of raster stack layers using {terra}?

I have a raster stack of 4 layers. Two of the layers are from model 1, two of the layers are from model 2. I need to calculate the median, 5th percentile and 95th percentile of each model. Is there some way to do this in one step? i.e. without writing out two intermediate stacks of rasters and then joining them together again. My attempt is below but it doesn't do the function by group.

library("terra")   
# Create some toy data
a <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names=1)
b <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names=1)
c <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names=2)
d <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names=2)
z <- c(a, b, c, d)

# Try to write a function to do the work
app(z,
    function(x) {
      c(median(x), quantile(x, c(0.05, 0.95)))
      },
     filename = "grouped_stats.tif)

My desired result is a raster stack of 6 layers. Something like this.

class       : SpatRaster
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
sources     : memory  (3 layers)
              memory  (3 layers)
names       : median_1, q5_1, q95_1, median_2, pc5_2, pc95_2
min values  :      7.5,  5.0,  10.0,      7.5,   5.0,   10.0
max values  :      7.5,  5.0,  10.0,      7.5,   5.0,   10.0

Any ideas please? Thanks.

EFFORT 1

Inspired by @spacedman I wrote this function but it doesn't quite get me there. Putting it here as possible inspiration for others.

grouped_stats <- function(x) {
  layers_names <- unique(names(x))
  cell_output <- NA
  for (each_layer in layers_names) {
     cell_output <- rbind(cell_output,
                    c(median(x[[each_layer]], na.rm = TRUE),
                      quantile(x[[each_layer]], 0.05, 0.95)))
     names(cell_output) <- glue("{each_layer}_{c('median','pc5','pc95')}")
  }
  cell_output
}

g <- app(z, fun = grouped_stats)

EFFORT 2

Getting nearer I think, but not quite there.

my_stats_function <- function(x) {c(median(x), quantile(0.05, 0.95))}

app(z, 
    function(x){
      unlist(tapply(x, layer_names, my_stats_function))
      })

class       : SpatRaster 
dimensions  : 10, 10, 4  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
source      : memory
names       :   11, 1.95%,   21, 2.95%
min values  : 7.50,  0.05, 7.50,  0.05
max values  : 7.50,  0.05, 7.50,  0.05

EFFORT 3

Think I'm about there. :-)

my_stats_function <- function(x) {c(median(x), quantile(x, c(0.05, 0.95)))}

app(z, 
    function(x){
      unlist(tapply(x, layer_names, my_stats_function))
      })

class       : SpatRaster
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
source      : memory
names       : 11, 1.5%, 1.95%, 21, 2.5%, 2.95%
min values  :  5,    5,     5,  5,    5,     5
max values  :  5,    5,     5,  5,    5,     5

Upvotes: 1

Views: 1508

Answers (2)

Robert Hijmans
Robert Hijmans

Reputation: 47036

Your example data (with changed layer names for clarity)

library(terra)
a1 <- rast(ncol = 10, nrow = 10, vals=rep(5,100),  names="A")
a2 <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names="A")
b1 <- rast(ncol = 10, nrow = 10, vals=rep(5,100),  names="B")
b2 <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names="B")
z <- c(a1, a2, b1, b2)

What your version of "terra" it was not possible to tapp because it did not handle functions that return multiple numbers. However, you can do that with the current version

f <- function(x) quantile(x, c(0.5, 0.05, 0.95))
x <- tapp(z, names(z), f)
x
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#names       : A_5%, A_50%, A_95%, B_5%, B_50%, B_95%
#min values  : 5.25,   7.5,  9.75, 5.25,   7.5,  9.75 
#max values  : 5.25,   7.5,  9.75, 5.25,   7.5,  9.75 

That is the cleanest solution, I think. But in this case, because there is a quantile<SpatRaster> method it might be faster to do

probs <- c(0.05, 0.5, 0.95) 
ids <- names(z)
uids <- unique(ids)
x <- lapply(uids, function(i) quantile(z[[ids == i]], probs))
rast(x)
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#names       : q0.05, q0.5, q0.95, q0.05, q0.5, q0.95 
#min values  :  5.25,  7.5,  9.75,  5.25,  7.5,  9.75 
#max values  :  5.25,  7.5,  9.75,  5.25,  7.5,  9.75 

But I would only even look at that if the rasters are very large.

I assume that in all cases it is more efficient to use

function(x) quantile(x, c(0.5, 0.05, 0.95))

instead of

function(x) c(median(x), quantile(x, c(0.05, 0.95)))

Here is a less efficient solution that uses a loop and app

f <- function(x) quantile(x, c(0.05, 0.5, 0.95)) 
ids <- names(z)
uids <- unique(ids)
x <- lapply(uids, function(i) app(z[[ids == i]], f))
rast(x)
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#sources     : memory  (3 layers) 
#              memory  (3 layers) 
#names       :   5%, 50%,  95%,   5%, 50%,  95% 
#min values  : 5.25, 7.5, 9.75, 5.25, 7.5, 9.75 
#max values  : 5.25, 7.5, 9.75, 5.25, 7.5, 9.75 

Upvotes: 4

Spacedman
Spacedman

Reputation: 94172

If you work with the parts within the function you can do this:

First define which layers are which group:

> p1 = c(1,3)
> p2 = c(2,4)

Then subset x and work with that:

> app(z, function(x){c(median(x[p1]), quantile(x[p1], c(0.05, 0.95)), median(x[p2], ), quantile(x[p2], c(0.05, 0.95)))})
class       : SpatRaster                  
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : memory 
names       : lyr.1, lyr.2, lyr.3, lyr.4, lyr.5, lyr.6 
min values  :     5,     5,     5,    10,    10,    10 
max values  :     5,     5,     5,    10,    10,    10 

I'm not sure how you get a median of 7.5 in your sample though, so maybe you used mean for that, and maybe your groupings are different...

Upvotes: 1

Related Questions