Reputation: 1772
I need to take data from 1303 rasters (each raster has data for 1 month) and make a time series for each grid cell in the rasters. In the end I will join all the time series into one massive (zoo) file.
I have the code that can do it (I tried on a small portion of the dataset and it worked) but it seems to be taking for ever just to stack the raster (more than 2 hours now and still counting) and this is not the slower part, that will be doing the time series. So here is my code, if anyone knows a faster way to stack rasters and /or to create the time series (maybe without the double loop?) please help...
I don't know any other programming language but would this be just too much to ask from R?
files <- list.files(pattern=".asc")
pat <- "^.*pet_([0-9]{1,})_([0-9]{1,}).asc$"
ord_files <- as.Date(gsub(pat, sprintf("%s-%s-01", "\\1", "\\2"), files))
files<-files[order(ord_files)]
#using "raster" package to import data
s<- raster(files[1])
pet<-vector()
for (i in 2:length(files))
{
r<- raster(files[i])
s <- stack(s, r)
}
#creating a data vector
beginning = as.Date("1901-01-01")
full <- seq(beginning, by='1 month', length=length(files))
dat<-as.yearmon(full)
#building the time series
for (lat in 1:360)
for (long in 1:720)
{
pet<-as.vector(s[lat,long])
x <- xts(pet, dat)
write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="") , sep=",")
}
Upvotes: 2
Views: 3437
Reputation: 1
I tried another way to dealing with numerous files. First I combined the time series raster into one file in the NetCDF format, Using write.Raster(x,format="CDF",..) and then just read one file for each year, this time I used brick(netcdffile,varname='') it the reading saves a lot. However, I need to save each cell's value for all the years according to some predefined format,in which I use write.fwf(x=v,...,append=TRUE) but it takes a long time for nearly 500,000 points. Is anyone has the same experiences and help on how to speed up this process? Here is my code for extracting all the value for each point:
weather4Point <- function(startyear,endyear)
{
for (year in startyear:endyear)
{
#get the combined netCDF file
tminfile <- paste("tmin","_",year,".nc",sep='')
b_tmin <- brick(tminfile,varname='tmin')
pptfile <- paste("ppt","_",year,".nc",sep='')
b_ppt <- brick(pptfile,varname='ppt')
tmaxfile <- paste("tmax","_",year,".nc",sep='')
b_tmax <- brick(tmaxfile,varname='tmax')
#Get the first year here!!!
print(paste("processing year :",year,sep=''))
for(l in 1:length(pl))
{
v <- NULL
#generate file with the name convention with t_n(latitude)w(longitude).txt, 5 digits after point should be work
filename <- paste("c:/PRISM/MD/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='')
print(paste("processing file :",filename,sep=''))
tmin <- as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1))
tmax <- as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1))
ppt <- as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2))
v <- cbind(tmax,tmin,ppt)
tablename <- c("tmin","tmax","ppt")
v <- data.frame(v)
colnames(v) <- tablename
v["default"] <- 0
v["year"] <- year
date <- seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days")
month <- as.numeric(substr(date,6,7))
day <- as.numeric(substr(date,9,10))
v["month"] <- month
v["day"] <- day
v <- v[c("year","month","day","default","tmin","tmax","ppt")]
#write into a file with format
write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
}
}
}
Upvotes: 0
Reputation: 47071
The first bit could simply be:
s <- stack(files)
The reason why creating a stack is somewhat slow is that each file needs to be opened and checked to see if it has the same nrow, ncol etc. as the other files. If you are absolutely certain that is the case, you can use a shortcut like this (NOT generally recommended)
quickStack <- function(f) {
r <- raster(f[1])
ln <- extension(basename(f), '')
s <- stack(r)
s@layers <- sapply(1:length(f), function(x){ r@file@name = f[x]; r@layernames=ln[x]; r@data@haveminmax=FALSE ; r })
s@layernames <- ln
s
}
quickStack(files)
You can probably also speed up the second part as in the below examples, depending on how much RAM you have.
Read row by row:
for (lat in 1:360) {
pet <- getValues(s, lat, 1)
for (long in 1:720) {
x <- xts(pet[long,], dat)
write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="") , sep=",")
}
}
more extreme, read all values in one go:
pet <- getValues(s)
for (lat in 1:360) {
for (long in 1:720) {
cell <- (lat-1) * 720 + long
x <- xts(pet[cell,], dat)
write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="") , sep=",")
}
}
Upvotes: 2
Reputation: 40813
Something like this should work (if you have enough memory):
#using "raster" package to import data
rlist <- lapply(files, raster)
s <- do.call(stack, rlist)
rlist <- NULL # to allow freeing of memory
It loads all raster
objects into a big list and then calls stack
once.
Here's an example of the speed gains: 1.25 sec vs 8 secs for 60 files - but your old code is quadratic in time so the gains are much higher for more files...
library(raster)
f <- system.file("external/test.grd", package="raster")
files <- rep(f, 60)
system.time({
rlist <- lapply(files, raster)
s <- do.call(stack, rlist)
rlist <- NULL # to allow freeing of memory
}) # 1.25 secs
system.time({
s<- raster(files[1])
for (i in 2:length(files)) {
r<- raster(files[i])
s <- stack(s, r)
}
}) # 8 secs
Upvotes: 1
Reputation: 9047
I will repost my comment here and give a better example:
The general idea: allocate the space for s before the 'raster'-loop is executed. If you concatenate s and r to a new object s inside the loop, R has to allocate new memory for s for each iteration. This is really slow, especially if s is large.
s <- c()
system.time(for(i in 1:1000){ s <- c(s, rnorm(100))})
# user system elapsed
# 0.584 0.244 0.885
s <- rep(NA, 1000*100)
system.time(for(i in seq(1,1000*100,100)){ s[i:(i+99)] <- rnorm(100) })
# user system elapsed
# 0.052 0.000 0.050
as you can see, pre-allocation is around 10 times faster.
Unfortunately I am not familiar with raster
and stack
so I can not tell you how to apply this to your code.
Upvotes: 1