Reputation: 395
I have a human-readable file containing 1 billion doubles all written in a single line (1 billion columns).
The file itself is only around 8G and I am using
fread("filename.data", sep=" ", header=FALSE, data.table=TRUE, showProgress=TRUE)
to load them into an R session. The script will always be "Killed" and the most amount of information I get from showProgress
is
* caught segfault * address 0x7efc7bed2010, cause 'memory not mapped'
I've loaded much larger files (raw size) using the same approach in the past but probably in "matrix form" and with fewer columns. I'm guessing that data.table is needing to store 1 billion column names which is costing a lot of memory... Is this correct?
fread
a straight into row vector (as opposed to transposing after reading)?Upvotes: 1
Views: 335
Reputation: 27732
fread singele row as single column?
here you go..
library(data.table)
#read using default separators
fread('v1,v2,v2,v3
this, is, a, test
of, fread,one,line')
# v1 v2 v2 v3
# 1: this is a test
# 2: of fread one line
#read one column per line/row
fread('v1,v2,v2,v3
this, is, a, test
of, fread,one,line', sep = "", header = FALSE)
# V1
# 1: v1,v2,v2,v3
# 2: this, is, a, test
# 3: of, fread,one,line
Upvotes: 1