Reputation: 584
I am trying to resample a forest cover raster with high resolution (25 meters) and categorical data (1 to 13) to a new RasterLayer
with a lower resolution (~ 1 km). My idea is to combine the forest cover data with other lower-resolution raster data :
I tried raster::resample()
, but since the data is categorical I lost a lot of information:
summary(as.factor(df$loss_year_mosaic_30m))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
3777691 65 101 50 151 145 159 295 291 134 102 126 104 91
As you can see, the new raster has the desired resolution but have lots of zeros as well. I suppose that is normal since I used the ´ngb´ option in resample
.
The second strategy was using raster::aggregate()
but I find difficult to define a factor integer since the change of resolution is not straightforward (like the double of the resolution or alike).
My high-resolution raster has the following resolution, and I want it to aggregate it to a 0.008333333, 0.008333333 (x, y)
resolution to the same extent.
loss_year
class : RasterLayer
dimensions : 70503, 59566, 4199581698 (nrow, ncol, ncell)
resolution : 0.00025, 0.00025 (x, y)
extent : -81.73875, -66.84725, -4.2285, 13.39725 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : /Volumes/LaCie/Deforestacion/Hansen/loss_year_mosaic_30m.tif
names : loss_year_mosaic_30m
values : 0, 13 (min, max)
I have tried a factor of ~33.33 following the description of the aggregate
help: "The number of cells is the number of cells of x divided by fact*fact
(when fact is a single number)." Nonetheless, the resulting raster data do not seem to have the same number of rows and columns as my other low-resolution rasters.
I have never used this high-resolution data, and I am also computationally limited (some of this commands can be parallelized using clusterR
, but sometimes they took the same time than the non-parallelized commands, especially since they do not work for nearest neighboor calculations).
I am short of ideas; maybe I can try layerize
to obtain a count raster, but I have to ´aggregate´ and the factor
problem arises. Since this processes are taking me days to process, I do want to know the most efficient way to create a lower resolution raster without losing much information
A reproducible example could be the following:
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6) #Low resolution raster
First strategy: loss of information
r <- resample(r_hr, r_lr, method = "ngb") #The raster data is categorical
Second strategy: difficult to define an aggregate factor
r <- aggregate(r_hr, factor) #How to define a factor to get exactly the same number of cells of h_lr?
Another option: layerize
r_brick <- layerize(r_hr)
aggregate(r_brick, factor) #How to define factor to coincide with the r_lr dimensions?
Thanks for your help!
Upvotes: 6
Views: 12661
Reputation: 31452
It is pretty standard practice to aggregate land cover maps into layers of %cover. I.e you should aim to produce 13 layers, each being something like %cover in that grid cell. Doing this allows you to reduce the resolution while retaining as much information as possible. N.B if you require a different summary statistic than %
, should be easy to adapt the following method to whatever statistic you want, by changing the fun =
function in aggregate
.
The following method is pretty fast (it takes just a few seconds on my laptop to process raster with 100 million cells):
First, let's create some dummy rasters to use
Nhr <- 1e4 # resolution of high-res raster
Nlr <- 333 # resolution of low-res raster
r.hr <- raster(ncols=Nhr, nrows=Nhr)
r.lr <- raster(ncols=Nlr, nrows=Nlr)
r.hr[] <- sample(1:13, Nhr^2, replace=T)
Now, we begin by aggregating the high res raster to almost the same resolution as the low res one (to nearest integer number of cells). Each resulting layer contains the fraction of area within that cell in which value of original raster is N.
Nratio <- as.integer(Nhr/Nlr) # ratio of high to low resolutions, to nearest integer value for aggregation
layer1 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==1, na.rm=na.rm)})
layer2 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==2, na.rm=na.rm)})
And finally, resample low res raster to the desired resolution
layer1 <- resample(layer1, r.lr, method = "ngb")
layer2 <- resample(layer2, r.lr, method = "ngb")
repeat for each layer, and build your layers into a stack or a multi-band raster
Upvotes: 5
Reputation: 47061
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6)
r_hr
#class : RasterLayer
#dimensions : 70, 70, 4900 (nrow, ncol, ncell)
#resolution : 5.142857, 2.571429 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#data source : in memory
#names : layer
#values : 1, 5 (min, max)
r_lr
#class : RasterLayer
#dimensions : 6, 6, 36 (nrow, ncol, ncell)
#resolution : 60, 30 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Direct aggregate is not possible, because 70/6 is not an integer.
dim(r_hr)[1:2] / dim(r_lr)[1:2]
#[1] 11.66667 11.66667
Nearest neighbor resampling is not a good idea either as the results would be arbitrary.
Here is a by layer approach that you suggested and dww also showed already.
b <- layerize(r_hr)
fact <- round(dim(r_hr)[1:2] / dim(r_lr)[1:2])
a <- aggregate(b, fact)
x <- resample(a, r_lr)
Now you have proportions. If you want a single class you could do
y <- which.max(x)
In that case, another approach would be to aggregate the classes
ag <- aggregate(r_hr, fact, modal)
agx <- resample(ag, r_lr, method='ngb')
Note that agx
and y
are the same. But they can both be problematic as you might have cells with 5 classes with each about 20%, making it rather unreasonable to pick one winner.
Upvotes: 7