Efficiently reading specific lines from large files into R

Question

I'm surprised by how long it takes R to read in a specific line from a large file (11GB+). For example:

> t0 = Sys.time()
> read.table('data.csv', skip=5000000, nrows=1, sep=',')
      V1       V2 V3 V4 V5   V6    V7
1 19.062 56.71047  1 16  8 2006 56281
> print(Sys.time() - t0)
Time difference of 49.68314 secs

OSX terminal can return a specific line in an instant. Does anyone know a more efficient way in R?

Dirk is no longer here · Accepted Answer

Well you can use something like this

 dat <- read.table(pipe("sed -n -e'5000001p' data.csv"), sep=',')

to read just the line extracted with other shell tools.

Also note that system.time(someOps) is an easier way to measure time.

Efficiently reading specific lines from large files into R

Answers (1)

Related Questions