Francesco Giardina
Francesco Giardina

Reputation: 9

Remove outliers from Raster object with multiple layers

I have a NetCDF file at very high spatial resolution:

dimensions : 2160, 4320, 9331200, 172  (nrow, ncol, ncell, nlayers)
resolution : 0.0833333, 0.0833333  (x, y)
extent     : -180, 179.9998, -90, 89.99993  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 

The final objective is to calculate the mean across all 172 layers (i.e. the temporal dimension). Before doing that, I would like to remove outliers from the time series cell by cell. For instance using the formula: x ± 2.5 * std(x), where x is the timeseries of the data in every cell.

I tried transforming the Raster object into a dataframe, but it is quite slow given the resolution. I also tried using the functions clamp and reclassify from the raster package, but the values used to subset are fixed for the entire Raster object:

x <- clamp(x, lower = 0, upper = 2.5)

What I'm looking for is a function to remove outliers based on the timeseries of every cell (i.e. dynamic lower and upper range above).

Is there a way of doing this directly with the Raster format, either with the package Raster or Terra?

There is a similar question here but in that case the Raster object doesn't have multiple layers (corresponding to the temporal dimension).

Upvotes: 0

Views: 200

Answers (1)

Robert Hijmans
Robert Hijmans

Reputation: 47481

Example data (please always include some when asking an R question)

library(terra)
s <- rast(ncol=10, nrow=10, nlyr=30)
set.seed(1)
values(s) <- rnorm(size(s), 10)
s[1] <- NA

Solution 1: Compute the mean and sd, and use ifel

mn <- mean(s)
sdv <- stdev(s) * 1.5
lower <- mn-sdv
upper <- mn+sdv

x <- ifel(s < lower, NA, ifel(s > upper, NA, s))

Check the results for a cell

i <- 57
round(s[i], 2)
#  lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
#1  9.63    11   7.6  8.83 10.51  7.67    10 12.02  9.23  10.96  11.21  10.12  10.61
#  lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
#1   8.83  10.26  11.73   9.49   8.73   9.02  11.75   7.92   9.95    9.6    9.5   9.46
#  lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
#1   9.72  11.38  10.61   7.35  10.36

cbind(lower[i], upper[57])
#      mean    mean
#1 7.987305 11.6816

round(x[i], 2)
#  lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
#1  9.63    11    NA  8.83 10.51    NA    10    NA  9.23  10.96  11.21  10.12  10.61
#  lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
#1   8.83  10.26     NA   9.49   8.73   9.02     NA     NA   9.95    9.6    9.5   9.46
#  lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
#1   9.72  11.38  10.61     NA  10.36

Solution 2: use app

f <- function(x) {
    m <- mean(x) + c(-1,1) * sd(x)
    x[x < m[1] | x > m[2]] <- NA
    x
}
y <- app(s, f)

Solution 3. With terra 1.7.20 (currently the development version) you can now also do

 z <- clamp(s, lower, upper, values=FALSE)

Upvotes: 0

Related Questions