usel
usel

Reputation: 23

Why can I not work with RDS files after saving?

I processed a rasterstack and saved it as RDS file. I work in R projects and in case anything is changing in working directories or so, I include the full path while saving a rds. I work on the same computer and same directory, however I realized a few days later, that the saved rds is useless due to missing temporary file R creates when saving it.

Also I would like to be able, to save an rds file, say on RStudio Server and then use it on my local machine and vice versa.

How can I manage, that the actual file is saved in a proper environment or folder, so I can use it also later or even other persons on different machines?

I found a very similar thread to my problem but not a real solution (Issue with saveRDS).

Here is a minimal example

dir<- list.dirs(path="C:/mypath/")
sin<-lapply(1:length(dir), function(i){
       re<-list.files(dir[i], full.names=TRUE)
    })`

sin<-lapply(1:length(sin), function(f){
       re<-stack(sin[[f]])
      })
saveRDS(sin, "C:/mypath/temp_sinlist.rds")`

Now everything seems fine. But opening a session later on or even using the file on a different machine; this happens:

myrasterstack <- readRDS("C:/mypath/temp_sinlist.rds")

it looks alright when I call

myrasterstack`

> class       : RasterStack 
dimensions  : 13061, 13271, 173332531, 5  (nrow, ncol, ncell, nlayers)
resolution  : 30, 30  (x, y)
extent      : 379185, 777315, 523185, 915015  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=37 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
names       : LC08_L1TP_201602_01_T1_B10_TIR, LC08_L1TP_201602_01_T1_B11_TIR, LC08_L1TP_201602_01_T1_B02_BLUE, LC08_L1TP_201602_01_T1_B03_GREEN, LC08_L1TP_201602_01_T1_B04_RED 
min values  :                          17343,                          16761,                            6845,                             5941,                           5525 
max values  :                          45672,                          37158,                           65535,                            65535,                          65535 

But truth is, the file is not actually present:

plot(myrasterstack)

> Error in file(fn, "rb") : cannot open the connection
In addition: Warning message:
In file(fn, "rb") :
  cannot open file 'C:\Users\username\AppData\Local\Temp\Rtmp8mDPNi\raster\r_tmp_2018-07-31_134754_7296_99375.gri': No such file or directory

I don't want to use RDATA instead because of the overwriting issues.

Upvotes: 1

Views: 2736

Answers (2)

Robert Hijmans
Robert Hijmans

Reputation: 47156

To make a reproducible example, you can create a RasterStack like this:

library(raster)
sin <- stack(system.file("external/rlogo.grd", package="raster")) 

A RasterStack (and other Raster* objects), normally point to files on disk (but not via C++ pointers). The reason for this 'virtualization' is that the files used in this domain, are often much too large to read into RAM memory -- as would be standard practice in R. It is therefore a bad idea to save such an object between sessions; although it should work as long as you are on the same machine and the file is permanent (not in the temp folder). Rather, you should save the data as a raster type file, and reuse that. This is a feature, not a bug, as it helps dealing with large files, and avoids making unnecessary copies of these files.

This, instead of saveRDS, you should use writeRaster. Like this:

writeRaster(sin, "temp_sinlist.grd")

(or use another file format, as determined by the filename extension).

If you insist on using rds, and your dataset is relatively small, you can first read all values into memory:

sin <- readAll(sin)
saveRDS(sin, "temp_sinlist.rds")

Upvotes: 2

r2evans
r2evans

Reputation: 160447

I think it makes more sense to wrap around the object and allow you to optionally save it or a list of them.

RasterStack2list <- function(object, ...) {
  fn <- normalizePath(slot(object, "name"))
  md5 <- tools::md5sum(fn)
  contents <- readBin(fn, raw(1), n=file.info(fn)$size)
  list(object=object, path=fn, contents=contents, md5=md5)
}

list2RasterStack <- function(object, ..., sametempdir=FALSE, overwrite=FALSE) {
  if (! all(c("md5", "object", "path", "contents") %in% names(object))) {
    warning("no 'path' stored, object might be incomplete", call.=FALSE)
    return(object)
  }
  if (sametempdir) {
    dirn <- dirname(object[["path"]])
    if (!dir.exists(dirn)) dir.create(dirn, recursive=TRUE)
  } else {
    dirn <- tempdir()
  }
  fn <- file.path(dirn, basename(object[["path"]]))
  if (!sametempdir) slot(object[["object"]], "name") <- fn
  if (file.exists(fn) && !overwrite) {
    stop("file exists but 'overwrite' not true: ", sQuote(fn))
  }
  writeBin(object[["contents"]], fn)
  outmd5 <- tools::md5sum(fn)
  if (outmd5 != object[["md5"]]) {
    warning("file contents md5 checksum incorrect; expecting ",
            sQuote(object[["md5"]]), ", actually ", sQuote(outmd5),
            call.=FALSE)
  }
  return(object[["object"]])
}

Intended use, assuming sin is a list of objects:

saveRDS(lapply(sin, RasterStack2list), file="somewhere.rds")
# restart R, different computer, something else ...
newsin <- lapply(readRDS("somewhere.rds"), list2RasterStack)
# should be able to plot newsin[[1]], for instance

If you need to add functionality, here's the basic premise/breakdown:

  • this supports whatever saving mechanism you want, so use save or saveRDS or perhaps even jsonlite::toJSON if you really want;
  • this assumes that the only slot that we need to save is "name"; if you try this and there are other missing/wrong slots on restoration, you should be able to add steps to preserve them as well;
  • I added a md5 checksum b/c I've been bitten by endian-ness when I was intentionally playing and trying to break things (on another project), so it is highly likely to be unnecessary here (but barely slows things down); and
  • the option sametempdir= is meant to try to stiff-arm a problem in case the path is hard-coded in the raster object. I have no reason to believe it is, but test first on the same computer without this option, and if something fails, try setting sametempdir=TRUE. If this latter call works, then these wrapper functions will not support (yet) moving between different computers (that do not share the relevant filesystem).

Upvotes: 0

Related Questions