st4co4
st4co4

Reputation: 477

r data.table: getting an error while loading very large data file

I have an 8.6GB text file and I'm trying to load it using data.table and fread(). This works well with a 6.0GB file, but not with a larger file. I get the following error:

Registered S3 method overwritten by 'data.table':
method           from
print.data.table     
data.table 1.16.4 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
Error: cannot allocate vector of size 880.1 Mb

Specs of the PC I'm using: enter image description here

Could you please help me with what should I do? Do I need a better PC, more RAM or something else?

Upvotes: 1

Views: 86

Answers (1)

Tim G
Tim G

Reputation: 4147

As @Roland said, it's a memory issue. Depending on what you want to do with the large text-file, there are a couple of ways to handle these cases. For giving a better answer, some example code would be beneficial or even what you want to do with the text/ what is the form of this text (table, book, unstructured data)?

You can try setting R's memory limit to something higher like 14 GB

memory.limit(size = 14000)  

Also consider removing large data-objects from your environment if they are not needed anymore with rm(). You can also save objects as .RDS and rm() to free up memory

# Save an object to a file
saveRDS(object, file = "my_data.rds")
# Restore the object
readRDS(file = "my_data.rds")

Or even read in the data in rowwise chunks (chunking)

chunk_size <- 1000000
for(i in seq(1, file_size, by = chunk_size)) {
    chunk <- fread("yourfile.txt", nrows = chunk_size, skip = i)
    # Process chunk here
}

You can also use packages like DuckDB as described in multiple great answers here like this or with Arrow as described here. In this great question a user had a folder of multiple large text files, which he wanted to traverse. Maybe it helps.

Upvotes: 0

Related Questions