ozdrgnaDiies
ozdrgnaDiies

Reputation: 1929

Why does Ruby use so much memory storing large arrays?

I've been working on a project the past few days that deals with reading somewhat large files. To be specific, it reads them line by line to parse data to insert into a database.

While running tests on these files I found out that for a 400MB file, Ruby was allocating over 1.2GB of memory; about 3 times the amount the file takes itself. What was more troubling is that once that memory was taken Ruby didn't want to let the majority of it go. Manually running GC only recovers about a third of the memory it allocated meaning a 400MB file split into lines takes up over 800MB of memory.

This is really puzzling to me, and I'm wondering if there is something I am doing wrong. Here is my code that replicates the problem:

text = File.read("somefile.txt").split("\n")

I don't see how the lines being in an array is doubling its size in memory.

In another scenario, filling an array with 10 million characters leads to a ratio of 40 bytes of memory per 1 byte of data, although I'm ready to account this to string metadata.

For reference, I am using Ruby 2.1.5p273 [i386-mingw32] on Windows 8.

Also, before I get answers telling me to read lines another way: I already know some alternatives. This is just a question on memory consumption.

Upvotes: 2

Views: 1536

Answers (1)

Chris Heald
Chris Heald

Reputation: 62638

Everything in Ruby is an object, which in C is an RVALUE (a struct which describes the object and holds a pointer to memory allocated for its value), which IIRC is 40 bytes on a 64-bit machine, plus heap memory for the value of the RVALUE.

Ruby allocates memory in "heaps" (not the heap), which is a chunk of memory which stores RVALUEs. A heap has N slots, where 1 slot can hold one RVALUE. When you fill up a heap, Ruby will run GC to attempt to free up slots, and if it can't, it will allocate another heap to hold additional RVALUEs. It's difficult to get Ruby to release a heap once allocated, since it has to be completely empty before it can be collected, and Ruby is constantly allocating new objects.

In general, you should avoid allocating large numbers of unreleasable objects specifically because you end up allocating multiple RVALUE heaps which Ruby then has a hard time letting go of.

Upvotes: 10

Related Questions