Reputation: 11920
I have a website hosted on a Heroku Dyno that allows max 512MB of memory.
My site allows users to upload raw time series data in CSV format, and I wanted to load test the performance of uploading a CSV with ~100k rows (3.2 MB in size). The UI lets the user upload the file, which in turns kicks of a Sidekiq job to import each row in the file into my database. It stores the uploaded file under /tmp
storage on the dyno, which I believe gets cleared on each periodic restart of the dyno.
Everything actually finished without error, and all 100k rows were inserted. But several hours later I noticed my site was almost unresponsive and I checked Heroku metrics.
At the exact time I had started the upload, the memory usage began to grow and quickly exceeded the maximum 512MB.
The logs confirmed this fact -
# At the start of the job
Aug 22 14:45:51 gb-staging heroku/web.1: source=web.1 dyno=heroku.31750439.f813c7e7-0328-48f8-89d5-db79783b3024 sample#memory_total=412.68MB sample#memory_rss=398.33MB sample#memory_cache=14.36MB sample#memory_swap=0.00MB sample#memory_pgpgin=317194pages sample#memory_pgpgout=211547pages sample#memory_quota=512.00MB
# ~1 hour later
Aug 22 15:53:24 gb-staging heroku/web.1: source=web.1 dyno=heroku.31750439.f813c7e7-0328-48f8-89d5-db79783b3024 sample#memory_total=624.80MB sample#memory_rss=493.34MB sample#memory_cache=0.00MB sample#memory_swap=131.45MB sample#memory_pgpgin=441565pages sample#memory_pgpgout=315269pages sample#memory_quota=512.00MB
Aug 22 15:53:24 gb-staging heroku/web.1: Process running mem=624M(122.0%)
I can restart the Dyno to clear this issue, but I don't have much experience in looking at metrics so I wanted to understand what was happening.
Just a bit lost on where to start investigating.
Thanks!
Edit: - We have the Heroku New Relic addon which also collects data. Annoyingly enough, New Relic reports a different/normal memory usage value for that same time period. Is this common? What's it measuring?
Upvotes: 4
Views: 1653
Reputation: 5556
There are most probable reasons for that:
Scenario 1. You process the whole file, first by loading every record from CSV to memory, doing some processing and then iterating over it and storing into database.
If that's the case then you need to change your implementation to process this file in batches. Load 100 records, process them, store in the database, repeat. You can also look at activerecord-import
gem to speed up your inserts.
Scenario 2. You have memory leak in your script. Maybe you process in batches, but you hold references to unused object and they are not garbage collected.
You can find out by using ObjectSpace
module. It has some pretty useful methods.
count_objects
will return hash with counts for different object currently created on the heap:
ObjectSpace.count_objects
=> {:TOTAL=>30162, :FREE=>11991, :T_OBJECT=>223, :T_CLASS=>884, :T_MODULE=>30, :T_FLOAT=>4, :T_STRING=>12747, :T_REGEXP=>165, :T_ARRAY=>1675, :T_HASH=>221, :T_STRUCT=>2, :T_BIGNUM=>2, :T_FILE=>5, :T_DATA=>1232, :T_MATCH=>105, :T_COMPLEX=>1, :T_NODE=>838, :T_ICLASS=>37}
It's just a hash so you can look for specific type of object:
ObjectSpace.count_objects[:T_STRING]
=> 13089
You can plug this snippet in different points in your script to see how many objects are on the heap at specific time. To have consistent results you should manually trigger garbage collector before checking the counts. It will ensure that you will see only live objects.
GC.start
ObjectSpace.count_objects[:T_STRING]
Another useful method is each_object
which iterates over all objects actually on the heap:
ObjectSpace.each_object { |o| puts o.inspect }
Or you can iterate over objects of one class:
ObjectSpace.each_object(String) { |o| puts o.inspect }
Scenario 3. You have memory leak in a gem or system library.
This like previous scenario, but the problem lies not in your code. You can find this also by using ObjectSpace
. If you see there are some objects retained after calling library method, there is a chance that this library may have a memory leak. The solution would be to update such library.
Take a look at this repo. It maintains the list of gems with known memory leak problems. If you have something from this list I suggest to update it quickly.
Now addressing your other questions. If you have perfectly healthy app on Heroku or any other provider, you will always see memory increase over time, but it should stabilise at some point. Heroku is restarting dynos once a day or so. On your metrics you will see sudden drops and the slow increase over span of 2 days or so.
And New Relic by default shows average data from all instances. You should probably switch to showing data only from your worker dyno to see correct memory usage.
At the end I recommend to read this article about how Ruby uses memory. There are many useful tools mentioned there, derailed_benchmarks in particular. It was created by guy from Heroku (at that time) and it is a collection of many benchmarks related to most common problems people have on Heroku.
Upvotes: 2