Reputation: 1522
I've got a file with 2m+ lines.
To avoid memory overload, I want to read these lines in chunks and then perform further processing with the lines in the chunk.
I read that readLines
is the fastest but I could not find a way to read chunks with readlines
.
raw = readLines(target_file, n = 500)
But what I'd want is to then have a readLines
for n = 501:1000
, e.g.
raw = readLines(target_file, n = 501:1000)
Is there a way to do this in R?
Upvotes: 1
Views: 793
Reputation: 1522
Maybe this helps someone in the future:
The readr
package has just what I was looking for: a function to read lines in chunks.
read_lines_chunked
reads a file in chunks of lines and then expects a callback to be run on these chunks.
Let f
be the function needed for storing a chunk for later use:
f = function(x, pos){
filename = paste("./chunks/chunk_", pos, ".RData", sep="")
save(x, file = filename)
}
Then I can use this in the main wrapper as:
read_lines_chunked(file = target_json
, chunk_size = 10000
, callback = SideEffectChunkCallback$new(f)
)
Works.
Upvotes: 3
Reputation: 6496
I don't know how many variables (columns) you have, but data.table::fread
is a very fast alternative to what you want:
require(data.table)
raw <- fread(target_file)
Upvotes: 0