Macbook
Macbook

Reputation: 135

Loading/Reading data in R taking up too much memory

I am using R for some data analysis. System specs: i5 + 4GB RAM. For some reason, my R session is taking up a chunk of my RAM much much bigger than my data which leaves me with very little space for other operations.

I read a 550MB csv file, memory taken by R: 1.3 - 1.5GB I saved the csv as a .RData file. File size: 183MB. Loaded the file in R, memory taken by R: 780MB. Any idea why this could be happening and how to fix it?

Edits: The file has 123 columns and 1190387 rows. The variables are of type num and int.

Upvotes: 6

Views: 19478

Answers (4)

Nikolai
Nikolai

Reputation: 489

I assume you are using read.csv() which is based on read.table().

The problem with these functions is that they fragment the memory horribly. And since the R garbage collector cannot move allocated space to free memory from the fragmented parts (a shortcomig of the R garbage collector) you are stuck with the solution you choose:

  • Read the data via read.table.
  • Save it via save().
  • Kill R.
  • Load data via load().

Upvotes: 1

StayLearning
StayLearning

Reputation: 701

(overlap some with the previous comments)

You may use read_csv or read_table from readr package, which helps to load data faster.

Use gc() and mem_change() to check the change in memory and identify which step leads to the increase of the memory.

You may certainly construct a connection and read in data by chunks.

Or create a database and then use RPostgreSQL; RSQLite; RMySQL. check dbConnect, dbWriteTable, dbGetQuery.

It is hard to say more without a reproductive example.

Upvotes: 0

bdemarest
bdemarest

Reputation: 14665

A numeric value (double precision floating point) is stored in 8 bytes of ram.
An integer value (in this case) uses 4 bytes.
Your data has 1,190,387 * 123 = 146,417,601 values.
If all columns are numeric that makes 1,171,340,808 bytes of ram used (~1.09GB).
If all are integer then 585,670,404 bytes are needed (~558MB).

So it makes perfect sense that your data uses 780MB of ram.

Very General Advice:

  1. Convert your data.frame to a matrix. Matrix operations often have less overhead.
  2. Try R package bigmemory: http://cran.r-project.org/web/packages/bigmemory/index.html
  3. Buy more ram. Possibly your machine can support up to 16GB.
  4. Don't load all your data into ram at the same time. Load subsets of rows or columns, analyze, save results, repeat.
  5. Use a very small test dataset to design your analysis, then analyze the full dataset on another machine/server with more memory.

Upvotes: 23

Paul Hiemstra
Paul Hiemstra

Reputation: 60984

R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc function. However, when the memory is needed, gc is called automatically.

In addition, it is not evident a 550 mb csv file maps to 550 mb in R. This depends on the data types of the columns (float, int, character),which all use different amounts of memory.

The fact that your Rdata file is smaller is not strange as R compresses the data, see the documentation of save.

Upvotes: 6

Related Questions