Reputation: 23
I need to efficiently extract temperature data from grib2 files (which i put in a raster stack). Each raster layer in the stack represents a point in time.
Now I need to extract one single value for each observation (x,y,t). The following code does the job, but it takes too much time. Any suggestions to improve efficiency are much appreciated.
files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE)
s <- stack(files)
userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)
smalldata <- userdata[1:100,]
tic()
smalldata$temp1morning <- getValues(s[[smalldata$t]], smalldata$y)[smalldata$x]
toc()
EDIT: loki's answer is really useful. When I bring this approach to my temperature data, however, it is really slow. I suspect that this is caused by the structure of my temperature data, with many time periods. See below for a comparison between the proposed approach and a comparable try with getValues
. Any ideas why this is the case or how I could improve the code?
> files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))
>
> s <- stack(files)
> s
class : RasterStack
dimensions : 197, 821, 161737, 971 (nrow, ncol, ncell, nlayers)
resolution : 0.25, 0.25 (x, y)
extent : 190.875, 396.125, 22.875, 72.125 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371229 +b=6371229 +no_defs
names : gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, gfs.0p25.//mayr258302, ...
>
> userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
> userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)
>
> smalldata <- data.frame(x = userdata$x[1:2],
+ y = userdata$y[1:2],
+ t = userdata$t[1:2])
>
> smalldata
x y t
1 142 67 547
2 779 14 829
>
> tic("apply")
> smalldata$temp1morning <- apply(smalldata, 1, function(x){s[x[2], x[1]][x[3]]})
> toc()
apply: 305.41 sec elapsed
>
> tic("getValues")
> smalldata$temp2morning <- apply(smalldata, 1, function(x){getValues(s[[x[3]]], x[2])[x[1]]})
> toc()
getValues: 0.33 sec elapsed
>
> smalldata
x y t temp1morning temp2morning
1 142 67 547 13.650018 13.650018
2 779 14 829 -1.750006 -1.750006
>
Upvotes: 2
Views: 1601
Reputation: 23
I found a simple solution that works for me. First, I get my temperature data to an array using as.array
. Then, I use apply
on the array as loki suggested:
files <- list.files(path="Weather/NCEP/temperature_3hour_forecast", full.names = TRUE, pattern = glob2rx("*06.f003.grib*"))
s <- stack(files)
a <- as.array(s)
userdata$x <- sample(1:ncol(s), nrow(userdata), replace=T)
userdata$y <- sample(1:nrow(s), nrow(userdata), replace=T)
smalldata <- data.frame(x = userdata$x[1:nrow(userdata)],
y = userdata$y[1:nrow(userdata)],
t = userdata$t[1:nrow(userdata)])
tic("array")
userdata$temp1morning <- apply(smalldata, 1, function(x){a[x[2], x[1], x[3]]})
toc()
This is easily fast enough for my purposes. loki, thanks for your help!
Upvotes: 0
Reputation: 10350
Let's start with a reproducible example:
library(raster)
r <- raster(ncol = 100, nrow = 100)
r[] <- runif(ncell(r))
s <- stack(r, r, r)
s
Now, we assume that your userdata
holds the following structure:
x
is the index of the pixels rowy
is the index of the pixels columnt
is the index of the layer, we are looking for the pixel inlet's create a reproducible userdata:
userdata <- data.frame(x = sample(1:100, 10),
y = sample(1:100, 10),
t = sample(1:3, 10, replace = T))
We then can use apply
to work through all the lines in userdata
and use the indices of the row, col, and layer to extract the values:
userdata$pixelvalue <- apply(userdata, 1, function(x){s[x[1], x[2]][x[3]]})
In each iteration of the apply
a pixel is selected by its x and y position in the raster for all the layers. x[3]
then returns only the value of the corresponding layer.
This follows the logic:
stack[*row*, *column*][*layer*]
The advantage to your approach is, that you don't have to convert the whole raster to a vector (which is basically what getValues
does) but to directly access the data as a matrix structure of the RasterStack
.
Upvotes: 1