Reputation: 51
I'm developping a Rails application that deals with huge amounts of data and it halts since it uses all memory of my computer due to memory leak (allocated objects that are not released).
In my application, data is organized in a hierarchical way, as a tree, where each node of level "X" contains the sum of data of level "X+1". For example if the data of level "X+1" contains the amount of people in cities, level "X" contains the amount of people in states. In this way, level "X"'s data is obtained by summing up the amount of data in level "X+1" (in this case, people).
For the sake of this question, consider a tree with four levels: country, State, City and Neighbourhoods and that each level is mapped into Activerecords tables (countries, states, cities, neighbourhoods).
Data is read from a csv file that fills the leaves of the tree, that is, the neighbourhoods table.
Afetr that, data flows from bottom (neighbourhoods) to top (countries) in the following sequence:
1) Neighbourhoods data is summed to Cities;
2) after step 1 is completed, Cities data is summed to States;
3) after step 2 is completed, States data is summed to Country;
The schematic code I'm using is as follows:
1 cities = City.all
2 cities.each do |city|
3 city.data = 0
4 city.neighbourhoods.each do |neighbourhood|
5 city.data = city.data + neighbourhood.data
6 end
7 city.save
8 end
The lowest level of the tree contains 3.8M of records. Each time lines 2-8 are executed, a city is summed up and after line 8 is executed, that subtree is no longer needed, but it is never released (memory leak). After summing 50% of the cities, all my 8Gbytes of RAM vanishes.
My question is what can I do. Buy better hardware will not do since I'm working with a "small" prototype.
I know a way to make it work: restart the application for each City, but I hope someone has a better idea. The "simplest" would be to force the garbage collector to free specific objects, but seems is not a way to do it (https://www.ruby-forum.com/t/how-do-i-force-ruby-to-release-memory/195515).
From the following articles I understood that the developer should organize the data in a way to "suggest" the garbage collector what should be freed. Maybe another approach will do the trick, but the only alternative I see is Depth-first search approach instead of the reversed Breadth-first search I'm using, but I don't see why it should work.
What I read so far:
https://stackify.com/how-does-ruby-garbage-collection-work-a-simple-tutorial/
https://www.toptal.com/ruby/hunting-ruby-memory-issues
https://scoutapm.com/blog/ruby-garbage-collection
https://scoutapm.com/blog/manage-ruby-memory-usage
Thanks
Upvotes: 1
Views: 1898
Reputation: 102249
This isn't really a case of a memory leak. You're just indescrimely loading data off the table which will exhaust the available memory.
The solution is to load the data off the database in batches:
City.find_each do |city|
city.update(data: city.neighbourhoods.sum(&:data))
end
If neighbourhoods.data
is a simple integer you don't need to fetch the records in the first place:
City.update_all(
'data = (SELECT SUM(neighbourhoods.data) FROM neighbourhoods WHERE neighbourhoods.city_id = cities.id)'
)
This will be an order of magnitude faster and have a trivial memory consumption as all the work is done in the database.
If you REALLY want to load a bunch of records into rails then make sure to select aggregates instead of instantiating all those nested records:
City.left_joins(:neighbourhoods)
.group(:id)
.select(:id, 'SUM(neighbourhoods.data) AS n_data')
.find_each { |c| city.update(data: n_data) }
Upvotes: 2
Reputation: 12550
You don't need rails at all, with pure SQL should be good enough to do what you're trying:
City.connection.execute(<<-SQL.squish)
UPDATE cities SET cities.data = (
SELECT SUM("neighbourhoods.data")
FROM neighbourhoods
WHERE neighbourhoods.city_id = cities.id
)
SQL
Upvotes: 0
Reputation: 429
Depending on your how your model associations are setup, should be able to take advantage of preloading.
For Example:
class City < ApplicationRecord
has_many :neighborhoods
class Neighborhood < ApplicationRecord
belongs_to :city
belongs_to :state
class State < ApplicationRecord
belongs_to :country
has_many :neighborhoods
class Country < ApplicationRecord
has_many :states
cities = City.all.includes(neighborhoods: { state: :country })
cities.each do |city|
...
end
Upvotes: 0