user321627
user321627

Reputation: 2572

For saving a single large object in R, is saveRDS or save faster?

I currently have a very large array (500 elements, each with a 1000 by 20 matrix). I have been using saveRDS to save objects. However, it consistently takes a very long time to do so. I am wondering if save() is faster, or if there are options in each to save things faster? Thanks.

Upvotes: 6

Views: 6812

Answers (2)

Qinsi
Qinsi

Reputation: 820

The qs package provides a faster alternative for saving and loading R objects. You can use the qs::qsave() function to save your Seurat object more efficiently.

Here’s how you can use it:

# Install the qs package if you haven't already
install.packages("qs")

# Save your Seurat object using qsave
qs::qsave(your_seurat_object, file = "path/to/your/seurat_object.qs")

Upvotes: 0

hrbrmstr
hrbrmstr

Reputation: 78832

You can always spelunk a bit in the sources:

saveRDS():

function (object, file = "", ascii = FALSE, version = NULL, compress = TRUE, 
    refhook = NULL) {}
...
  .Internal(serializeToConn(object, con, ascii, version, refhook))
}

Eventually: https://github.com/wch/r-source/blob/2c3e0e757e81ca23c34da8dde4ff925bd9d275f0/src/main/serialize.c#L2471-L2536

save():

function (..., list = character(), file = stop("'file' must be specified"), 
    ascii = FALSE, version = NULL, envir = parent.frame(), compress = isTRUE(!ascii), 
    compression_level, eval.promises = TRUE, precheck = TRUE) {
...
  .Internal(saveToConn(list, con, ascii, version, envir,  eval.promises))
}

Eventually: https://github.com/wch/r-source/blob/6ac8f58c608337200f85ea47cba2abc717be6eb5/src/main/saveload.c#L1973-L2041

OR

give it a benchmark (List assuming it's a list of matrix objects):

library(microbenchmark)

set.seed(0)

lapply(1:500, function(i) {
  matrix(sample(20*1000), nrow = 1000, ncol = 20)
}) -> matrix_list

print(str(matrix_list, list.len=5))
## List of 500
##  $ : int [1:1000, 1:20] 17934 5310 7442 11456 18161 4033 17963 18887 13211 12577 ...
##  $ : int [1:1000, 1:20] 2227 4212 2296 2907 6198 3005 10531 2358 9543 15374 ...
##  $ : int [1:1000, 1:20] 5969 11861 11057 11933 7852 17959 14794 530 16811 17003 ...
##  $ : int [1:1000, 1:20] 1073 14634 12948 16282 2087 6687 7992 7640 18482 8043 ...
##  $ : int [1:1000, 1:20] 10900 8249 6059 10767 15541 17139 11663 9010 576 14900 ...
##   [list output truncated]
## NULL

pryr::object_size(matrix_list)
## 40.1 MB

microbenchmark(
  save = save(matrix_list, file = "/tmp/out.rda"),
  saveRDS = saveRDS(matrix_list, file = "/tmp/out.rds"),
  times = 5,
  control = list(warmup = 2)
) -> mb

mb
## Unit: seconds
##     expr      min       lq     mean   median       uq       max neval
##     save 8.571138 8.578461 8.747248 8.650629 8.665557  9.270453     5
##  saveRDS 8.647355 8.655231 9.298947 8.684998 8.772102 11.735052     5

You can play with the compress & compression_level settings in save() and level in gzcon() for use in saveRDS() with compress to see if changing or removing compression helps.

Upvotes: 5

Related Questions