Severin Pappadeux
Severin Pappadeux

Reputation: 20080

Fastest way to save/load data.table

What I would like to do is actually use the fastest available method to store data.tables for further processing.

Something along the lines of:

  1. Read original data from CSV/RDS.
  2. Convert it to a data.table.
  3. Save it into a format optimized for re-reading (RDS doesn't seem to work with data.table, is that right? Is there some other binary option?)
  4. Continue to work over with file from step #3, reading it directly as a data.table over and over again, doing slicing, grouping, plotting, ...

What is the best option for step #3?

Upvotes: 6

Views: 4685

Answers (1)

Severin Pappadeux
Severin Pappadeux

Reputation: 20080

Ok, here some measurements on particular dataset I'm using. It is originally in RDS, and reading it takes 60+ seconds.

After that DT was saved as internal XDR as well as SQLite db, both uncompressed.

  1. save()/load() pair was fastest, 11.7-11.8 seconds load

  2. SQLite (dbReadTable) was pretty close, 12.0-12.1 seconds. File size with DB is about 30% smaller, so I could imagine the case where SQLite would be faster than save()/load().

For now save()/load() is for me, and it preserves class as well

Upvotes: 2

Related Questions