Walker in the City
Walker in the City

Reputation: 587

Quickly Write Vector to File r

What is the fastest way to write a vector to a file? I have a character vector that is ~2 million rows and that has rather large values (200 characters). I am currently doing

write(myVector, "myFile.txt")

But this is extremely slow. I have searched around for solutions but the fast writing functions (such as fwrite) only take a data frame/matrix as input. Thanks!

Upvotes: 18

Views: 30763

Answers (3)

JeanVuda
JeanVuda

Reputation: 1778

You could use data.table's fwrite:

library(data.table) # install if not installed already
fwrite(list(myVector), file = "myFile.csv")

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76575

After trying several options I found the fastest to be data.table::fwrite. Like @Gregor says in his first comment, it is faster by an order of magnitude, which is worth the extra package loaded. It is also one of the ones that produces bigger files. (The other one is readr::write_lines. Thanks to the comment by Calum You, I had forgotten this one.)

library(data.table)
library(readr)

set.seed(1)    # make the results reproducible
n <- 1e6
x <- rnorm(n)

t1 <- system.time({
    sink(file = "test_sink.txt")
    cat(x, "\n")
    sink()
})
t2 <- system.time({
    cat(x, "\n", file = "test_cat.txt")
})
t3 <- system.time({
    write(x, file = "test_write.txt")
})
t4 <- system.time({
    fwrite(list(x), file = "test_fwrite.txt")
})
t5 <- system.time({
    write_lines(x, "test_write_lines.txt")
})

rbind(sink = t1[1:3], cat = t2[1:3], 
      write = t3[1:3], fwrite = t4[1:3],
      readr = t5[1:3])
#       user.self sys.self elapsed
#sink        4.18    11.64   15.96
#cat         3.70     4.80    8.57
#write       3.71     4.87    8.64
#fwrite      0.42     0.02    0.51
#readr       2.37     0.03    6.66

In his second comment, Gregor notes that as.list and list behave differently. The difference is important. The former writes the vector as one row and many columns, the latter writes one column and many rows.

The speed difference is also noticeable:

fw1 <- system.time({
    fwrite(as.list(x), file = "test_fwrite.txt")
})
fw2 <- system.time({
    fwrite(list(x), file = "test_fwrite2.txt")
})

rbind(as.list = fw1[1:3], list = fw2[1:3])
#        user.self sys.self elapsed
#as.list      0.67     0.00    0.75
#list         0.19     0.03    0.11

Final clean up.

unlink(c("test_sink.txt", "test_cat.txt", "test_write.txt",
         "test_fwrite.txt", "test_fwrite2.txt", "test_write_lines.txt"))

Upvotes: 22

IRTFM
IRTFM

Reputation: 263421

I found writeBin to be twice as fast as fwrite. Try this:

 zz <- file("myFile.txt", "wb")
 writeBin( paste(myVector, collapse="\n"), zz ) 
  close(zz)

Using the same timing approach offered by Rui I get (older box):

            user.self sys.self elapsed
sink            9.650    7.900  17.418
cat             6.507    7.870  14.254
write           6.436    7.849  14.171
fwrite          0.500    0.051   0.593
write_lines     4.337    0.150   4.451
writeBin        0.238    0.006   0.242 

Upvotes: 8

Related Questions