Rushabh Patel
Rushabh Patel

Reputation: 2764

Import .rds file to h2o frame directly

I have a large .rds file saved and I trying to directly import .rds file to h2o frame using some functionality, because it is not feasible for me to read that file in R enviornment and then use as.h2o function to convert. I am looking for some fast and efficient way to deal with it.

My attempts:

  1. I have tried to read that file and then convert it into h2o frame. But, it is way much time consuming process.
  2. I tried saving file in .csv format and using h2o.import() with parse=T. Due to memory constraint I was not able to save complete dataframe.

Please suggest me any efficient way to do it.

Any suggestions would be highly appreciated.

Upvotes: 2

Views: 514

Answers (1)

Erin LeDell
Erin LeDell

Reputation: 8819

The native read/write functionality in R is not very efficient, so I'd recommend using data.table for that. Both options below make use of data.table in some way.

First, I'd recommend trying the following: Once you install the data.table package, and load the h2o library, set options("h2o.use.data.table"=TRUE). What that will do is make sure that as.h2o() uses data.table underneath for the conversion from an R data.frame to an H2O Frame. Something to note about how as.h2o() works -- it writes the file from R to disk and then reads it back again into H2O using h2o.importFile(), H2O's parallel file-reader.

There is another option, which is effectively the same thing, though your RAM doesn't need to store two copies of the data at once (one in R and one in H2O), so it might be more efficient if you are really strapped for resources.

Save the file as a CSV or a zipped CSV. If you are having issues saving the data frame to disk as a CSV, then you should make sure you're using an efficient file writer like data.table::fwrite(). Once you have the file on disk, read it directly into H2O using h2o.importFile().

Upvotes: 5

Related Questions