lolibility
lolibility

Reputation: 2187

Why would a read-in variable consumes way more memory than file's storage size in R

When I tried to read in a big file of actual size 672MB into R, it turns out that the system memory usage exploded from 0.98 G to 3.6 G (I'm using a 4 GB memory desktop). Which means it takes several times of space to store the file into memory and I can do nothing calculation after I read in as lack of memory. Is that normal? The code I've used: a=read.table(file.choose(),header=T,colClasses="integer",nrows=16777777,comment.char="",sep="\t") The file contains 167772XX lines.

gc() before and after I run enter image description here

not sure what does this mean.

Upvotes: 1

Views: 130

Answers (1)

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

Your text file is 672MB. Assuming all your integers are 1 digit, it's perfectly reasonable that your R object is about 2*672MB.

Each character in a text file is 1 byte. R stores integers in 4 bytes (see ?integer). That means your file contains ~336MB of "\t" and ~336MB of integers stored as 1-byte characters.

R reads those 1-byte characters, stores them as 4-byte integers and... 336*4 = 1344MB. The second row and second column of your gc output reads 1345.6, which equals 1344MB + the original 1.6MB.

Upvotes: 6

Related Questions