vrepsys
vrepsys

Reputation: 2213

Why does repeated JSON parsing consume more and more memory?

It seems that parsing the same JSON file over and over again in Ruby uses increasingly larger amounts of memory. Consider the code and the output below:

  1. Why isn't the memory freed up after the first iteration?
  2. Why does a 116MB JSON file take up 1.5Gb of RAM after parsing? It's surprising considering the text file is converted into hashes. What am I missing here?

Code:

require 'json'

def memused
  `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1]/1024
end

text = IO.read('../data-grouped/2012-posts.json')
puts "before parsing: #{memused}MB"
iter = 1
while true
  items = JSON.parse(text)
  GC.start
  puts "#{iter}: #{memused}MB"
  iter += 1
end

Output:

before parsing: 116MB
1: 1840MB
2: 2995MB
3: 2341MB
4: 3017MB
5: 2539MB
6: 3019MB

Upvotes: 12

Views: 2902

Answers (1)

Thiago Lewin
Thiago Lewin

Reputation: 2828

When Ruby parses a JSON file, it creates many intermediate objects to achieve the goal. These objects stays on memory until GC start working.

If the JSON file has a complicated structure, many arrays and inner objects, the number will grow fast too.

Did you try to call "GC.start" to suggest Ruby clean up unused memory? If the amount of memory decrease significantly, its suggest that is just intermediate objects used to parse the data, otherwise, your data structure is complex or there is something your data that the lib can't deallocate.

For large JSON processing I use yajl-ruby (https://github.com/brianmario/yajl-ruby). It is C implemented and has a low footprint.

Upvotes: 4

Related Questions