Reputation: 23680
I'm surprised by how long it takes R to read in a specific line from a large file (11GB+). For example:
> t0 = Sys.time()
> read.table('data.csv', skip=5000000, nrows=1, sep=',')
V1 V2 V3 V4 V5 V6 V7
1 19.062 56.71047 1 16 8 2006 56281
> print(Sys.time() - t0)
Time difference of 49.68314 secs
OSX terminal can return a specific line in an instant. Does anyone know a more efficient way in R?
Upvotes: 10
Views: 2300
Reputation: 368539
Well you can use something like this
dat <- read.table(pipe("sed -n -e'5000001p' data.csv"), sep=',')
to read just the line extracted with other shell tools.
Also note that system.time(someOps)
is an easier way to measure time.
Upvotes: 20