sedonarae
sedonarae

Reputation: 11

How to select top 30% of cells in a raster (by cell value) in R

I have a raster in R, I need to select the highest cell values up until 30% of the raster area is selected.

The way that I've tried to accomplish this is by calculating the average cell area, and then calculating how many cells I need to meet this 30% target (I know this is not entirely accurate). Then I sort the raster values, descending. Here is where I'm stuck. Of these sorted values, I need to set all cells beyond #12,678 to NA. I can't figure out how to set values to NA based on their place in an order. Does anyone know how to do this? Or have a better idea for the entire process?

Upvotes: 1

Views: 719

Answers (2)

Robert Hijmans
Robert Hijmans

Reputation: 47061

If you assume a constant cell size, then you might as well ignore it and you can do something like this (find the quantile of interest and use that as a threshold):

library(terra)
r <- rast(system.file("ex/elev.tif", package = "terra"))

q <- global(r, \(i) quantile(i, 0.7, na.rm=T))
x <- ifel(r <= q[[1]], NA, r)
plot(x)

Check

g <- global(c(x,r), "notNA")
g[1,1]/g[2,1]
# 0.2979601

If variation in cell size is important, you could do something like the below (sort the values, and find the cells below the cumulative cell size threshold), but note that this approach is not memory-safe (would fail with very large rasters)

s <- c(r, cellSize(r, unit="km"))
d <- as.data.frame(s, cell=T, na.rm=T)
d <- d[order(d$elevation, decreasing=TRUE), ]
d$sumarea <- cumsum(d$area) / sum(d$area)
cells <- d[d$sumarea <= 0.3, "cell"]
msk <- rast(r)
msk[cells] <- 1
x <- mask(r, msk)
plot(x)

Upvotes: 0

dieghernan
dieghernan

Reputation: 3402

you can use tidyterra for this.

Once that you have identified your thresold (12,678) use tidyterra::filter() on your raster.

I don't know if you are using raster or terra, I use here terra and I show you how to convert back to Raster* object.

On this example I present a full workflow with an example raster file, just adapt it to your data.

library(terra)
#> terra 1.5.21
library(tidyterra)
#> -- Attaching packages --------------------------------------- tidyterra 0.1.0 --
#> 
#> Suppress this startup message by setting Sys.setenv(tidyterra.quiet = TRUE)
#> v tibble 3.1.7     v dplyr  1.0.9
#> v tidyr  1.2.0

# Create a SpatRaster from a file
f <- system.file("ex/elev.tif", package = "terra")
r <- rast(f)

# If you are using raster package use this line for switch it to terra:
# r <- rast(your_data)

r
#> class       : SpatRaster 
#> dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : elev.tif 
#> name        : elevation 
#> min value   :       141 
#> max value   :       547

totarea_km <- expanse(r, unit = "km")
prettyNum(totarea_km, big.mark = ",")
#> [1] "2,563.61"

plot(r)

enter image description here


# There is a bug on slice_max, use filter

# Get thresold for filtering by working on the values of the raster
# as it is a data frame
min_value <- as_tibble(r, na.rm = TRUE) %>%
  slice_max(order_by = elevation, prop = .3) %>%
  min()

# Here it goes!!
top30perc <- r %>% filter(elevation > min_value)

# Check
area_30perc <- expanse(top30perc, unit = "km")

prettyNum(area_30perc, big.mark = ",")
#> [1] "761.0991"

# Check
area_30perc / totarea_km
#> [1] 0.2968857

# Seems ok


plot(top30perc)

enter image description here

# If you need to convert to Raster* object

raster::stack(top30perc)
#> class      : RasterStack 
#> dimensions : 90, 95, 8550, 1  (nrow, ncol, ncell, nlayers)
#> resolution : 0.008333333, 0.008333333  (x, y)
#> extent     : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> crs        : +proj=longlat +datum=WGS84 +no_defs 
#> names      : elevation 
#> min values :       387 
#> max values :       547

Created on 2022-05-31 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions