sbg
sbg

Reputation: 1772

Increase performance/speed

I need to take data from 1303 rasters (each raster has data for 1 month) and make a time series for each grid cell in the rasters. In the end I will join all the time series into one massive (zoo) file.

I have the code that can do it (I tried on a small portion of the dataset and it worked) but it seems to be taking for ever just to stack the raster (more than 2 hours now and still counting) and this is not the slower part, that will be doing the time series. So here is my code, if anyone knows a faster way to stack rasters and /or to create the time series (maybe without the double loop?) please help...

I don't know any other programming language but would this be just too much to ask from R?

files <- list.files(pattern=".asc") 
pat <- "^.*pet_([0-9]{1,})_([0-9]{1,}).asc$"
ord_files <- as.Date(gsub(pat, sprintf("%s-%s-01", "\\1", "\\2"), files))
files<-files[order(ord_files)]


#using "raster" package to import data 
s<- raster(files[1])
pet<-vector()
for (i in 2:length(files))
{
r<- raster(files[i])
s <- stack(s, r)
}

#creating a data vector
beginning = as.Date("1901-01-01")
full <- seq(beginning, by='1 month', length=length(files))
dat<-as.yearmon(full)

#building the time series
for (lat in 1:360)
for (long in 1:720)
{
pet<-as.vector(s[lat,long])
x <- xts(pet, dat)
write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="")  , sep=",")
}

Upvotes: 2

Views: 3437

Answers (4)

Peter
Peter

Reputation: 1

I tried another way to dealing with numerous files. First I combined the time series raster into one file in the NetCDF format, Using write.Raster(x,format="CDF",..) and then just read one file for each year, this time I used brick(netcdffile,varname='') it the reading saves a lot. However, I need to save each cell's value for all the years according to some predefined format,in which I use write.fwf(x=v,...,append=TRUE) but it takes a long time for nearly 500,000 points. Is anyone has the same experiences and help on how to speed up this process? Here is my code for extracting all the value for each point:

weather4Point <- function(startyear,endyear)  
{

  for (year in startyear:endyear)
  {
    #get the combined netCDF file

    tminfile <- paste("tmin","_",year,".nc",sep='')

    b_tmin <- brick(tminfile,varname='tmin')

    pptfile <- paste("ppt","_",year,".nc",sep='')

    b_ppt <- brick(pptfile,varname='ppt')

    tmaxfile <- paste("tmax","_",year,".nc",sep='')

    b_tmax <- brick(tmaxfile,varname='tmax')

    #Get the first year here!!!

    print(paste("processing year :",year,sep=''))

    for(l in 1:length(pl))
    {
      v <- NULL

      #generate file with the name convention with t_n(latitude)w(longitude).txt, 5 digits after point should be work

      filename <- paste("c:/PRISM/MD/N",round(coordinates(pl[l,])[2],5),"W",abs(round(coordinates(pl[l,])[1],5)),".wth",sep='')  

      print(paste("processing file :",filename,sep=''))            

      tmin <- as.numeric(round(extract(b_tmin,coordinates(pl[l,])),digits=1))

      tmax <- as.numeric(round(extract(b_tmax,coordinates(pl[l,])),digits=1))

      ppt <- as.numeric(round(extract(b_ppt,coordinates(pl[l,])),digits=2))

      v <- cbind(tmax,tmin,ppt)

      tablename <- c("tmin","tmax","ppt")

      v <- data.frame(v)   

      colnames(v) <- tablename

      v["default"] <- 0

      v["year"] <- year

      date <- seq(as.Date(paste(year,"/1/1",sep='')),as.Date(paste(year,"/12/31",sep='')),"days")

      month <- as.numeric(substr(date,6,7))

      day   <- as.numeric(substr(date,9,10))

      v["month"] <- month 

      v["day"]  <-  day

      v <- v[c("year","month","day","default","tmin","tmax","ppt")]

      #write into a file with format
      write.fwf(x=v,filename,append=TRUE,na="NA",rownames=FALSE,colnames=FALSE,width=c(6,3,3,5,5,5,6))
    }
  }
}

Upvotes: 0

Robert Hijmans
Robert Hijmans

Reputation: 47071

The first bit could simply be:

s <- stack(files) 

The reason why creating a stack is somewhat slow is that each file needs to be opened and checked to see if it has the same nrow, ncol etc. as the other files. If you are absolutely certain that is the case, you can use a shortcut like this (NOT generally recommended)

quickStack <- function(f) {
r <- raster(f[1])
ln <- extension(basename(f), '')
s <- stack(r)
s@layers <- sapply(1:length(f), function(x){ r@file@name = f[x]; r@layernames=ln[x]; r@data@haveminmax=FALSE ; r })
s@layernames <- ln
s
}

quickStack(files)

You can probably also speed up the second part as in the below examples, depending on how much RAM you have.

Read row by row:

for (lat in 1:360) {
pet <- getValues(s, lat, 1)
for (long in 1:720) {
    x <- xts(pet[long,], dat)
    write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="")  , sep=",")
}
}

more extreme, read all values in one go:

 pet <- getValues(s)
 for (lat in 1:360) {
for (long in 1:720) {
    cell <- (lat-1) * 720 + long
    x <- xts(pet[cell,], dat)
    write.zoo(x,file=paste("P:/WRSRL/Users1/ncgk/IBERIA/cru_pet/zoo/","lat",lat,"long",long,".csv", sep="")  , sep=",")
}
}

Upvotes: 2

Tommy
Tommy

Reputation: 40813

Something like this should work (if you have enough memory):

#using "raster" package to import data 
rlist <- lapply(files, raster)
s <- do.call(stack, rlist)
rlist <- NULL # to allow freeing of memory

It loads all raster objects into a big list and then calls stack once.

Here's an example of the speed gains: 1.25 sec vs 8 secs for 60 files - but your old code is quadratic in time so the gains are much higher for more files...

library(raster)
f <- system.file("external/test.grd", package="raster")
files <- rep(f, 60)

system.time({
 rlist <- lapply(files, raster)
 s <- do.call(stack, rlist)
 rlist <- NULL # to allow freeing of memory
}) # 1.25 secs

system.time({
 s<- raster(files[1])
 for (i in 2:length(files)) {
  r<- raster(files[i])
  s <- stack(s, r)
 }
}) # 8 secs

Upvotes: 1

smu
smu

Reputation: 9047

I will repost my comment here and give a better example:

The general idea: allocate the space for s before the 'raster'-loop is executed. If you concatenate s and r to a new object s inside the loop, R has to allocate new memory for s for each iteration. This is really slow, especially if s is large.

s <- c()
system.time(for(i in 1:1000){ s <- c(s, rnorm(100))})
# user  system elapsed 
# 0.584   0.244   0.885 

s <- rep(NA, 1000*100)
system.time(for(i in seq(1,1000*100,100)){ s[i:(i+99)] <- rnorm(100) })
# user  system elapsed 
# 0.052   0.000   0.050

as you can see, pre-allocation is around 10 times faster.

Unfortunately I am not familiar with raster and stack so I can not tell you how to apply this to your code.

Upvotes: 1

Related Questions