NGInd
NGInd

Reputation: 81

Memory Allocation in ruby

I have a large dataset of some 16000 nodes. For each node, I am finding k nearest neighbours based on some similarity measure. The top k neighbours reside in a priority queue (self implemented). As the simulation proceeds (calculating knn for all the neighbours), the memory usage keeps on increasing and the simulation becomes slower. If I want to delete the priority queue of previous users as I move forward to free the memory, how can I do it? Is this the only possible reason or there can be other reasons too for the slow performance?

Upvotes: 1

Views: 1791

Answers (2)

quetzalcoatl
quetzalcoatl

Reputation: 33566

If you allow your queue to grow, then you usually should design some way of shrinking it, too.

You call your structure a 'queue'. So I assume that some elements flow-in (i.e. added to an array), and some elements flow-out. What do you do with them when they flow-out? Do you remove them from the queue? Are you nil-ing their variables or their specific arraycells when they are removed from queue? Do you shrink the arrays when after many old elements were nilled?

If you are, then it should be OK.

Ruby has a GC, so every object that is "lost" should be automatically removed at some point of time. Note the 'should' and 'at some point'. It's hard to tell or guarantee when. If you have lots of free memory, then maybe the GC simply did not run yet. Try kicking it manually and see if the memory usage drops.

If you are not removing old entries, then it never will be OK. Until you make the queue actually forget the objects, the objects will stay alive and will occupy the memory. See above.

If you are only nil-ing them and you never shrink or reuse old space in arrays, then the GC will sweep the detached old objects, but still your arrays will grow over time. It's not wise to have an array of 1000000 elements where 999900 are nil. Splice the array, or copy it to a smaller, or whatever. And adjust your algorithm, because indexes of the living elements will change.

There is of course one more case - you are doing everything properly, GC works, lost objects are removed. And it may be that the queue simply grows to enourmous sizes because the elements are not dequeued (processed and removed) fast enough. For example, you add 1000 new elements/second, and the worker thread removed 10 elements/second. After an hour you will have a nice (and growing) backlog even though everything works properly. Well. You get the idea. This is not solvable easily, and you must double check and correct your whole design.

For example, for quick patches, you might:

  • trivial: ensure the processing is faster, so the queue empties faster
  • enforce a limit to the length of the queue, ie. 1000 elements, and make the queue reject any attempts to add more (i.e. raise exception)
  • enforce a limit to the length of the queue, ie. 1000 elements, and silently automatically remove the least-important elements. This may have some drawbacks:
    • noone knows they were not processed
    • there still can be 1001+ elements of same importance, what then? it's hard to decide what to drop, or else the 'limit' will be weak and guarantee nothing

But those are just hints. In such case you must rethink it yourself, because only you know the important bits about the exact requirements that dictate what you can and what you can't forget.

Upvotes: 2

Trygve Flathen
Trygve Flathen

Reputation: 696

Ruby's garbage collection should take care of this, as long as you don't keep references to the objects you want to free. Make sure there are no references left.

You may also want to look at the GC module: http://www.ruby-doc.org/core-2.1.0/GC.html

Upvotes: 4

Related Questions