Use readLines in successive chunks R

Question

I've got a file with 2m+ lines.

To avoid memory overload, I want to read these lines in chunks and then perform further processing with the lines in the chunk.

I read that readLines is the fastest but I could not find a way to read chunks with readlines.

raw = readLines(target_file, n = 500)

But what I'd want is to then have a readLines for n = 501:1000, e.g.

raw = readLines(target_file, n = 501:1000)

Is there a way to do this in R?

ben_aaron · Accepted Answer

Maybe this helps someone in the future:

The readr package has just what I was looking for: a function to read lines in chunks.

read_lines_chunked reads a file in chunks of lines and then expects a callback to be run on these chunks.

Let f be the function needed for storing a chunk for later use:

f = function(x, pos){
 filename = paste("./chunks/chunk_", pos, ".RData", sep="")
 save(x, file = filename)
}

Then I can use this in the main wrapper as:

read_lines_chunked(file = target_json
               , chunk_size = 10000
               , callback = SideEffectChunkCallback$new(f)
               )

Works.

Answers (2)