Reputation: 23
I processed a rasterstack and saved it as RDS file
. I work in R projects and in case anything is changing in working directories or so, I include the full path while saving a rds. I work on the same computer and same directory, however I realized a few days later, that the saved rds is useless due to missing temporary file R creates when saving it.
Also I would like to be able, to save an rds file, say on RStudio Server and then use it on my local machine and vice versa.
How can I manage, that the actual file is saved in a proper environment or folder, so I can use it also later or even other persons on different machines?
I found a very similar thread to my problem but not a real solution (Issue with saveRDS).
Here is a minimal example
dir<- list.dirs(path="C:/mypath/")
sin<-lapply(1:length(dir), function(i){
re<-list.files(dir[i], full.names=TRUE)
})`
sin<-lapply(1:length(sin), function(f){
re<-stack(sin[[f]])
})
saveRDS(sin, "C:/mypath/temp_sinlist.rds")`
Now everything seems fine. But opening a session later on or even using the file on a different machine; this happens:
myrasterstack <- readRDS("C:/mypath/temp_sinlist.rds")
it looks alright when I call
myrasterstack`
> class : RasterStack
dimensions : 13061, 13271, 173332531, 5 (nrow, ncol, ncell, nlayers)
resolution : 30, 30 (x, y)
extent : 379185, 777315, 523185, 915015 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=37 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
names : LC08_L1TP_201602_01_T1_B10_TIR, LC08_L1TP_201602_01_T1_B11_TIR, LC08_L1TP_201602_01_T1_B02_BLUE, LC08_L1TP_201602_01_T1_B03_GREEN, LC08_L1TP_201602_01_T1_B04_RED
min values : 17343, 16761, 6845, 5941, 5525
max values : 45672, 37158, 65535, 65535, 65535
But truth is, the file is not actually present:
plot(myrasterstack)
> Error in file(fn, "rb") : cannot open the connection
In addition: Warning message:
In file(fn, "rb") :
cannot open file 'C:\Users\username\AppData\Local\Temp\Rtmp8mDPNi\raster\r_tmp_2018-07-31_134754_7296_99375.gri': No such file or directory
I don't want to use RDATA instead because of the overwriting issues.
Upvotes: 1
Views: 2736
Reputation: 47156
To make a reproducible example, you can create a RasterStack like this:
library(raster)
sin <- stack(system.file("external/rlogo.grd", package="raster"))
A RasterStack (and other Raster* objects), normally point to files on disk (but not via C++ pointers). The reason for this 'virtualization' is that the files used in this domain, are often much too large to read into RAM memory -- as would be standard practice in R. It is therefore a bad idea to save such an object between sessions; although it should work as long as you are on the same machine and the file is permanent (not in the temp folder). Rather, you should save the data as a raster type file, and reuse that. This is a feature, not a bug, as it helps dealing with large files, and avoids making unnecessary copies of these files.
This, instead of saveRDS
, you should use writeRaster
. Like this:
writeRaster(sin, "temp_sinlist.grd")
(or use another file format, as determined by the filename extension).
If you insist on using rds
, and your dataset is relatively small, you can first read all values into memory:
sin <- readAll(sin)
saveRDS(sin, "temp_sinlist.rds")
Upvotes: 2
Reputation: 160447
I think it makes more sense to wrap around the object and allow you to optionally save it or a list
of them.
RasterStack2list <- function(object, ...) {
fn <- normalizePath(slot(object, "name"))
md5 <- tools::md5sum(fn)
contents <- readBin(fn, raw(1), n=file.info(fn)$size)
list(object=object, path=fn, contents=contents, md5=md5)
}
list2RasterStack <- function(object, ..., sametempdir=FALSE, overwrite=FALSE) {
if (! all(c("md5", "object", "path", "contents") %in% names(object))) {
warning("no 'path' stored, object might be incomplete", call.=FALSE)
return(object)
}
if (sametempdir) {
dirn <- dirname(object[["path"]])
if (!dir.exists(dirn)) dir.create(dirn, recursive=TRUE)
} else {
dirn <- tempdir()
}
fn <- file.path(dirn, basename(object[["path"]]))
if (!sametempdir) slot(object[["object"]], "name") <- fn
if (file.exists(fn) && !overwrite) {
stop("file exists but 'overwrite' not true: ", sQuote(fn))
}
writeBin(object[["contents"]], fn)
outmd5 <- tools::md5sum(fn)
if (outmd5 != object[["md5"]]) {
warning("file contents md5 checksum incorrect; expecting ",
sQuote(object[["md5"]]), ", actually ", sQuote(outmd5),
call.=FALSE)
}
return(object[["object"]])
}
Intended use, assuming sin
is a list of objects:
saveRDS(lapply(sin, RasterStack2list), file="somewhere.rds")
# restart R, different computer, something else ...
newsin <- lapply(readRDS("somewhere.rds"), list2RasterStack)
# should be able to plot newsin[[1]], for instance
If you need to add functionality, here's the basic premise/breakdown:
save
or saveRDS
or perhaps even jsonlite::toJSON
if you really want;"name"
; if you try this and there are other missing/wrong slots on restoration, you should be able to add steps to preserve them as well;md5
checksum b/c I've been bitten by endian-ness when I was intentionally playing and trying to break things (on another project), so it is highly likely to be unnecessary here (but barely slows things down); andsametempdir=
is meant to try to stiff-arm a problem in case the path is hard-coded in the raster
object. I have no reason to believe it is, but test first on the same computer without this option, and if something fails, try setting sametempdir=TRUE
. If this latter call works, then these wrapper functions will not support (yet) moving between different computers (that do not share the relevant filesystem).Upvotes: 0