Ha Ma
Ha Ma

Reputation: 23

Efficiently access data from raster stack in R

I need to efficiently extract temperature data from grib2 files (which i put in a raster stack). Each raster layer in the stack represents a point in time.

Now I need to extract one single value for each observation (x,y,t). The following code does the job, but it takes too much time. Any suggestions to improve efficiency are much appreciated.

files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE)

s <- stack(files)

userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)

smalldata <- userdata[1:100,]
tic()
smalldata$temp1morning <- getValues(s[[smalldata$t]], smalldata$y)[smalldata$x]
toc()

EDIT: loki's answer is really useful. When I bring this approach to my temperature data, however, it is really slow. I suspect that this is caused by the structure of my temperature data, with many time periods. See below for a comparison between the proposed approach and a comparable try with getValues. Any ideas why this is the case or how I could improve the code?

> files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))
> 
> s <- stack(files)
> s
class       : RasterStack 
dimensions  : 197, 821, 161737, 971  (nrow, ncol, ncell, nlayers)
resolution  : 0.25, 0.25  (x, y)
extent      : 190.875, 396.125, 22.875, 72.125  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371229 +b=6371229 +no_defs 
names       : gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, ... 

> 
> userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
> userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)
> 
> smalldata <- data.frame(x = userdata$x[1:2],
+ y = userdata$y[1:2],
+ t = userdata$t[1:2])
> 
> smalldata
    x  y   t
1 142 67 547
2 779 14 829
> 
> tic("apply")
> smalldata$temp1morning <- apply(smalldata, 1, function(x){s[x[2], x[1]][x[3]]})
> toc()
apply: 305.41 sec elapsed
> 
> tic("getValues")
> smalldata$temp2morning <- apply(smalldata, 1, function(x){getValues(s[[x[3]]], x[2])[x[1]]})
> toc()
getValues: 0.33 sec elapsed
> 
> smalldata
    x  y   t temp1morning temp2morning
1 142 67 547    13.650018    13.650018
2 779 14 829    -1.750006    -1.750006
> 

Upvotes: 2

Views: 1601

Answers (2)

Ha Ma
Ha Ma

Reputation: 23

I found a simple solution that works for me. First, I get my temperature data to an array using as.array. Then, I use apply on the array as loki suggested:

files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))

s <- stack(files)
a <- as.array(s)

userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)

smalldata <- data.frame(x = userdata$x[1:nrow(userdata)],
            y = userdata$y[1:nrow(userdata)],
            t = userdata$t[1:nrow(userdata)])

tic("array")
userdata$temp1morning <- apply(smalldata, 1, function(x){a[x[2], x[1], x[3]]})
toc()

This is easily fast enough for my purposes. loki, thanks for your help!

Upvotes: 0

loki
loki

Reputation: 10350

Let's start with a reproducible example:

library(raster)
r <- raster(ncol = 100, nrow = 100)
r[] <- runif(ncell(r))

s <- stack(r, r, r)
s

Now, we assume that your userdata holds the following structure:

  • x is the index of the pixels row
  • y is the index of the pixels column
  • t is the index of the layer, we are looking for the pixel in

let's create a reproducible userdata:

userdata <- data.frame(x = sample(1:100, 10), 
                       y = sample(1:100, 10), 
                       t = sample(1:3, 10, replace = T))

We then can use apply to work through all the lines in userdata and use the indices of the row, col, and layer to extract the values:

userdata$pixelvalue <- apply(userdata, 1, function(x){s[x[1], x[2]][x[3]]})

In each iteration of the apply a pixel is selected by its x and y position in the raster for all the layers. x[3] then returns only the value of the corresponding layer.

This follows the logic:

stack[*row*, *column*][*layer*]

The advantage to your approach is, that you don't have to convert the whole raster to a vector (which is basically what getValues does) but to directly access the data as a matrix structure of the RasterStack.

Upvotes: 1

Related Questions