Reputation: 1051
Is there any method to compress data frame in R. I have an external file which I want to import into a data frame. But, since the data is large, it would result in a memory error. Although I am not sure if compression makes sense in R since it uses RAM memory for creating data structures, but it would really help me if anything synonymous to compression can be used.
Upvotes: 1
Views: 7375
Reputation: 5240
Maybe it's too late to answer this question, but I thought I better share some recent work in R that allows compressing data frames. Currently, there is a package in R called fst
(Lightning Fast Serialization of Data Frames for R)
, in which you can create compressed fst
objects for your data frame. A detailed explanation can be found in the fst-package manual, but I'll briefly explain how to use it and how much space an fst
object takes. First, Let's create a data frame with some data and then check the size of this data frame, as follows:
install.packages("pryr") # for object_size()
library(pryr)
N <- 1000 * 8
M <- 100
df <- data.frame(A = c(rep(strrep("A", M), N), rep(strrep("B", N), N)))
object_size(df)
# 73.3 kB
Now, let's convert this dataframe into an fst
object, as follows:
install.packages("fst") #install the package
library(fst) #load the package
path <- paste0(tempfile(), ".fst") #create a temporary '.fst' file
write_fst(df, path) #write the dataframe into the '.fst' file
ft <- fst(path) #load the data as an fst object
object_size(ft)
# 2.14 kB
The disk space for the created .fst
file is 434 bytes
. You can deal with the ft
object as a normal data frame (as far as I tried).
Hope this helps.
Upvotes: 3
Reputation: 161
If you have a large data frame than ff package might help you to store your large data with a smaller size. Try to look ff package which is available on CRAN
Upvotes: 1
Reputation: 49640
The data.table
package stores data similar to data frames but with some added efficiencies, this may compress your data sufficiently.
The more general solution would be to load your data into a database instead of directly into R, then pull just the pieces that you need from the database, the sqldf and RSQLite packages may be of help. There used to be a package called SQLiteDF that made this process transparent (the data was in a database, but you had an object in R that looked and acted like a data frame but pulled the data from the data base). There are archived copies of the package available through CRAN, but some work would probably be needed to get it working with recent versions of R (the latest version of the package was in 2009).
There are other tools on the CRAN Task View page mentioned in the comments (scroll down to the "Large Memory" section) that discuss some other possibilities and how to analyze data that is to large to work with in RAM.
Upvotes: 1