Reputation: 135
I am using R for some data analysis. System specs: i5 + 4GB RAM. For some reason, my R session is taking up a chunk of my RAM much much bigger than my data which leaves me with very little space for other operations.
I read a 550MB csv file, memory taken by R: 1.3 - 1.5GB I saved the csv as a .RData file. File size: 183MB. Loaded the file in R, memory taken by R: 780MB. Any idea why this could be happening and how to fix it?
Edits:
The file has 123 columns and 1190387 rows. The variables are of type num
and int
.
Upvotes: 6
Views: 19478
Reputation: 489
I assume you are using read.csv() which is based on read.table().
The problem with these functions is that they fragment the memory horribly. And since the R garbage collector cannot move allocated space to free memory from the fragmented parts (a shortcomig of the R garbage collector) you are stuck with the solution you choose:
Upvotes: 1
Reputation: 701
(overlap some with the previous comments)
You may use read_csv or read_table from readr package, which helps to load data faster.
Use gc() and mem_change() to check the change in memory and identify which step leads to the increase of the memory.
You may certainly construct a connection and read in data by chunks.
Or create a database and then use RPostgreSQL; RSQLite; RMySQL. check dbConnect, dbWriteTable, dbGetQuery.
It is hard to say more without a reproductive example.
Upvotes: 0
Reputation: 14665
A numeric value (double precision floating point) is stored in 8 bytes of ram.
An integer value (in this case) uses 4 bytes.
Your data has 1,190,387 * 123 = 146,417,601 values.
If all columns are numeric that makes 1,171,340,808 bytes of ram used (~1.09GB).
If all are integer then 585,670,404 bytes are needed (~558MB).
So it makes perfect sense that your data uses 780MB of ram.
Very General Advice:
Upvotes: 23
Reputation: 60984
R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc
function. However, when the memory is needed, gc
is called automatically.
In addition, it is not evident a 550 mb csv file maps to 550 mb in R. This depends on the data types of the columns (float, int, character),which all use different amounts of memory.
The fact that your Rdata file is smaller is not strange as R compresses the data, see the documentation of save
.
Upvotes: 6