Reputation: 25
I am interested in extracting the cell values alongside their corresponding x and y coordinates from a tif file accessible from the WorldPop database [ https://hub.worldpop.org/geodata/summary?id=49920 ].
I have converted this file alongside other tif files available on this website into rasters and then used the rasterToPoints function in R to extract this information. However, although this approach has worked for most of the files, it has failed for this particular file amongst a few others. It's like the R session remains stuck and the code just never runs when I attempt to convert the rasters to spdf data.
library(raster)
Raster <- raster("C:/file path/aus_ppp_2020_UNadj_constrained.tif")
Raster <- rasterToPoints(Raster, spatial = TRUE)
As an alternative, I thought I could extract the coordinates after obtaining the cell values using getValues() or readAll() but due to the size of the raster being too large I run into the following error:
Error: cannot allocate vector of size 17.8 Gb.
sessionInfo()
# R version 4.2.0 (2022-04-22 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22000)
library(memuse)
memuse::Sys.meminfo()
# Totalram: 31.781 GiB
# Freeram: 26.164 GiB
I then tried to see if I could modify the usable memory using memory.limit() but this code has been retired from R version 4.2 and I cannot find an alternative.
memory.limit()
# Warning: 'memory.limit()' is no longer supported[1] Inf
I was wondering if anyone knows:
1. If there is a way I could get the rasterToPoints function to work for this raster.
2. If there is a way to subset the raster to smaller rasters, while retaining all data, so that I could use the rasterToPoints function on each subset and then merge the resulting spatial point dataframes.
3. If there is an alternative way to extract the coordinates alongside the cell values for this tif file.
Any help is greatly appreciated.
Upvotes: 0
Views: 1653
Reputation: 1413
Although it seems like an overall questionable idea to break an efficient storage apart in general - it's not like the relation between values and coordinates is lost in a raster - there is a pretty convenient way to do so using terra
.
However, this case seems to be an exception to the general rule since your raster is huge - consisting of 55,075 rows and 74,928 columns - with only 2,910,517 values being not NA
according to global(r, "notNA")
. To put it in other words, 99.9 % of the values of your raster are NA
, so I get your point trying to convert this.
Unfortunately, my first attempt had some flaws as the as.polygons(r) |> as.points() |> as.data.frame(geom = "XY")
pipe only returned a result because of the dissolve = TRUE
default reducing the number of objects. Moreover, the output consisted of the nodes of the polygons dissolved, so I'd really like to correct my answer making use of makeTiles()
what seems to be an appropriate way when dealing with huge raster data and facing std::bad_alloc
errors.
Read your raster r
and create a second raster t
which is used for tiling using the extent and crs of r
but with significantly higher resolution:
library(terra)
#> terra 1.6.17
r <- rast("aus_ppp_2020_UNadj_constrained.tif")
r
#> class : SpatRaster
#> dimensions : 55075, 74928, 1 (nrow, ncol, nlyr)
#> resolution : 0.0008333333, 0.0008333333 (x, y)
#> extent : 96.81625, 159.2562, -55.11708, -9.22125 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> source : aus_ppp_2020_UNadj_constrained.tif
#> name : aus_ppp_2020_UNadj_constrained
#> min value : 0.000
#> max value : 1441.356
t <- rast(ext = ext(r), crs = crs(r), resolution = res(r) * 4000)
t
#> class : SpatRaster
#> dimensions : 14, 19, 1 (nrow, ncol, nlyr)
#> resolution : 3.333333, 3.333333 (x, y)
#> extent : 96.81625, 160.1496, -55.11708, -8.450416 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
makeTiles(r, t)
As a result, you'll find 14x19 tiles - this process takes some minutes - named tile_*.tif
in your working directory. Maybe there is a more elegant way to solve this without writing the subsets of your raster to disk, but on the other hand, you want to prevent RAM overflow, so writing to disk (temporarily) might be quite appropriate. And maybe there is also no necessity to split your raster in 266 tiles - one could check for the critical limit of values per raster additionally, I just used a random value per trial and error - to speed this up a little bit. However, if time is not critical, this workflow should at least produce proper results.
# list filenames of your tiles
tiles <- list.files(pattern = "tile_[0-9]+.tif", full.names = TRUE)
n <- tiles |> length()
# init an empty data frame
result <- data.frame()
# iterate over all files;
# read tile, write values (if present) as data frame, gather results
for (i in 1:n) {
rt <- rast(tiles[i])
if (global(rt, "notNA") > 0) {
x <- rt |> as.points() |> as.data.frame(geom = "XY")
result <- rbind(result, x)
}
}
# inspect result
head(result)
#> aus_ppp_2020_UNadj_constrained x y
#> 1 0.2376212 130.0342 -11.75917
#> 2 0.2404007 130.0350 -11.75917
#> 3 0.2336724 130.0358 -11.75917
#> 4 1.3149673 113.2033 -26.00583
#> 5 1.2930205 113.2033 -26.00667
#> 6 1.2527549 113.1792 -26.11167
dim(result)
#> [1] 2910517 3
# save data frame encompassing all values as csv file
write.csv(result, "aus_ppp_2020_UNadj_constrained.csv", row.names = FALSE)
Since result
consists of 2,910,517 rows, I'd estimate this approach as robust.
Upvotes: 0
Reputation: 199
First step would be to question why you are converting the raster to a data frame in the first place. There is a reason R can handle the full raster but not all the points as a data frame because a raster is a very efficient way of working with data.
To chop the raster up hopefully this method will work:
library(raster)
library(sf)
library(dplyr)
## download raster from source
r <- raster("https://data.worldpop.org/GIS/Population/Global_2000_2020_Constrained/2020/BSGM/AUS/aus_ppp_2020_UNadj_constrained.tif")
## determine the extent of the raster layer
x_min <- r@extent@xmin
x_max <- r@extent@xmax
y_min <- r@extent@ymin
y_max <- r@extent@ymax
## create a df
d <- data.frame(X = c(x_min, x_max, x_max, x_min),
Y = c(y_max, y_max, y_min, y_min))
## generate an sf of the raster domain
d_sf <- d %>%
st_as_sf(coords = c("X", "Y"), crs = 4326) %>%
dplyr::summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON")
## define the grid resolution (i.e. the size of the grids the domain will be divided into)
res <- 10
## create the grid
d_grid <- st_make_grid(d_sf, square = T, n = c(res, res), crs = 4326) %>% # the grid, covering bounding box
st_sf()
## pick a cell
random_cell <- d_grid[44,]
## crop the raster to this cell
r_crop <- crop(r, random_cell)
## check you have data
plot(r_crop)
## raster to points
pts <- rasterToPoints(r_crop)
## to trim the data to a specific location can use the osmdata package to get the extent of a place and crop to that
library(osmdata)
## go for Melbourne
p <- osmdata::getbb("Melbourne")
r_place_crop <- crop(r, p)
plot(r_place_crop)
place_pts <- rasterToPoints(r_place_crop)
Upvotes: 0