algae
algae

Reputation: 395

Memory requirement using `fread()` for large column vector

I have a human-readable file containing 1 billion doubles all written in a single line (1 billion columns).

The file itself is only around 8G and I am using

fread("filename.data", sep=" ", header=FALSE, data.table=TRUE, showProgress=TRUE)

to load them into an R session. The script will always be "Killed" and the most amount of information I get from showProgress is

* caught segfault * address 0x7efc7bed2010, cause 'memory not mapped'

I've loaded much larger files (raw size) using the same approach in the past but probably in "matrix form" and with fewer columns. I'm guessing that data.table is needing to store 1 billion column names which is costing a lot of memory... Is this correct?

  1. Is there no way to fread a straight into row vector (as opposed to transposing after reading)?
  2. Would this data be salvageable or do I need to re-write it as a row vector?

Upvotes: 1

Views: 335

Answers (1)

Wimpel
Wimpel

Reputation: 27732

fread singele row as single column?

here you go..

library(data.table)

#read using default separators
fread('v1,v2,v2,v3
this, is, a, test
of, fread,one,line')

#      v1    v2  v2   v3
# 1: this    is   a test
# 2:   of fread one line

#read one column per line/row
fread('v1,v2,v2,v3
this, is, a, test
      of, fread,one,line', sep = "", header = FALSE)

#                    V1
# 1:        v1,v2,v2,v3
# 2:  this, is, a, test
# 3: of, fread,one,line

Upvotes: 1

Related Questions