Reputation: 9
I have a NetCDF file at very high spatial resolution:
dimensions : 2160, 4320, 9331200, 172 (nrow, ncol, ncell, nlayers)
resolution : 0.0833333, 0.0833333 (x, y)
extent : -180, 179.9998, -90, 89.99993 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
The final objective is to calculate the mean across all 172 layers (i.e. the temporal dimension). Before doing that, I would like to remove outliers from the time series cell by cell. For instance using the formula: x ± 2.5 * std(x)
, where x is the timeseries of the data in every cell.
I tried transforming the Raster object into a dataframe, but it is quite slow given the resolution. I also tried using the functions clamp
and reclassify
from the raster package, but the values used to subset are fixed for the entire Raster object:
x <- clamp(x, lower = 0, upper = 2.5)
What I'm looking for is a function to remove outliers based on the timeseries of every cell (i.e. dynamic lower and upper range above).
Is there a way of doing this directly with the Raster format, either with the package Raster or Terra?
There is a similar question here but in that case the Raster object doesn't have multiple layers (corresponding to the temporal dimension).
Upvotes: 0
Views: 200
Reputation: 47481
Example data (please always include some when asking an R question)
library(terra)
s <- rast(ncol=10, nrow=10, nlyr=30)
set.seed(1)
values(s) <- rnorm(size(s), 10)
s[1] <- NA
Solution 1: Compute the mean and sd, and use ifel
mn <- mean(s)
sdv <- stdev(s) * 1.5
lower <- mn-sdv
upper <- mn+sdv
x <- ifel(s < lower, NA, ifel(s > upper, NA, s))
Check the results for a cell
i <- 57
round(s[i], 2)
# lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
#1 9.63 11 7.6 8.83 10.51 7.67 10 12.02 9.23 10.96 11.21 10.12 10.61
# lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
#1 8.83 10.26 11.73 9.49 8.73 9.02 11.75 7.92 9.95 9.6 9.5 9.46
# lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
#1 9.72 11.38 10.61 7.35 10.36
cbind(lower[i], upper[57])
# mean mean
#1 7.987305 11.6816
round(x[i], 2)
# lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9 lyr.10 lyr.11 lyr.12 lyr.13
#1 9.63 11 NA 8.83 10.51 NA 10 NA 9.23 10.96 11.21 10.12 10.61
# lyr.14 lyr.15 lyr.16 lyr.17 lyr.18 lyr.19 lyr.20 lyr.21 lyr.22 lyr.23 lyr.24 lyr.25
#1 8.83 10.26 NA 9.49 8.73 9.02 NA NA 9.95 9.6 9.5 9.46
# lyr.26 lyr.27 lyr.28 lyr.29 lyr.30
#1 9.72 11.38 10.61 NA 10.36
Solution 2: use app
f <- function(x) {
m <- mean(x) + c(-1,1) * sd(x)
x[x < m[1] | x > m[2]] <- NA
x
}
y <- app(s, f)
Solution 3. With terra 1.7.20 (currently the development version) you can now also do
z <- clamp(s, lower, upper, values=FALSE)
Upvotes: 0