otakustay
otakustay

Reputation: 12405

About closure, LexicalEnvironment and GC

as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment(LE) and a VariableEnvironment(VE), for function code, these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment(ECMAScript v5 10.4.3), and all variables declared in function code are stored in the environment record componentof VariableEnvironment(ECMAScript v5 10.5), and this is the basic concept for closure.

What confused me is how Garbage Collect works with this closure approach, suppose I have code like:

function f1() {
    var o = LargeObject.fromSize('10MB');
    return function() {
        // here never uses o
        return 'Hello world';
    }
}
var f2 = f1();

after the line var f2 = f1(), our object graph would be:

global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o

so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o has at lease 1 refenrence and would never be GCed. Appearently this would result a waste of memory since o would never be used but is always stored in memory.

Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment, so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:

function f1() {
    var o1 = LargeObject.fromSize('10MB');
    var o2 = LargeObject.fromSize('10MB');
    return function() {
        alert(o1);
    }
}
var f2 = f1();

in this case, f2 uses the o1 object which stores in f1's VariableEnvironment, so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment, which result that o2 cannot be GCed as well, which further result in a waste of memory.

so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.

Thanks.

Upvotes: 15

Views: 1121

Answers (3)

Shihua Ma
Shihua Ma

Reputation: 11

There is no standard specifications of implementation for GC, every engine have their own implementation. I know a little concept of v8, it has a very impressive garbage collector (stop-the-world, generational, accurate). As above example 2, the v8 engine has following step:

  1. create f1's VariableEnvironment object called f1.
  2. after created that object the V8 creates an initial hidden class of f1 called H1.
  3. indicate the point of f1 is to f2 in root level.
  4. create another hidden class H2, based on H1, then add information to H2 that describes the object as having one property, o1, store it at offset 0 in the f1 object.
  5. updates f1 point to H2 indicated f1 should used H2 instead of H1.
  6. creates another hidden class H3, based on H2, and add property, o2, store it at offset 1 in the f1 object.
  7. updates f1 point to H3.
  8. create anonymous VariableEnvironment object called a1.
  9. create an initial hidden class of a1 called A1.
  10. indicate a1 parent is f1.

On parse function literal, it create FunctionBody. Only parse FunctionBody when function was called.The following code indicate it not throw error while parser time

function p(){
  return function(){alert(a)}
}
p();

So at GC time H1, H2 will be swept, because no reference point that.In my mind if the code is lazily compiled, no way to indicate o1 variable declared in a1 is a reference to f1, It use JIT.

Upvotes: 1

waxwing
waxwing

Reputation: 18783

tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced.". In your second example, o2 can be allocated on the stack and is thrown away after f1 exits.


I don't think they can handle it. At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:

function outer(node) {
    node.onclick = function inner() { 
        // some code not referencing "node"
    };
}

where inner closes over node, forming a circular reference inner -> outer's VariableContext -> node -> inner, which will never be freed in for instance IE6, even if the DOM node is removed from the document. Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem. But now I digress from the subject.

A common way to break the circular reference is to null out all unnecessary variables at the end of outer. I.e., set node = null. The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner?

I think the answer is no, but I can be proven wrong. The reason is that the following code executes just fine:

function get_inner_function() {
    var x = "very big object";
    var y = "another big object";
    return function inner(varName) {
        alert(eval(varName));
    };
}

func = get_inner_function();

func("x");
func("y");

See for yourself using this jsfiddle example. There are no references to either x or y inside inner, but they are still accessible using eval. (Amazingly, if you alias eval to something else, say myeval, and call myeval, you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.)


Edit: As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more. I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8. It also specifically touches on the eval problem. It seems that it has to parse the code in all inner functions. and see what variables are referenced, or if eval is used, and then determine whether each variable should be allocated on the heap or on the stack. Pretty neat. Here is another blog that contains a lot of details on the ECMAScript implementation.

This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap. E.g.:

function init(node) {

    var someLargeVariable = "...";

    function drawSomeWidget(x, y) {
        library.draw(x, y, someLargeVariable);
    }

    drawSomeWidget(1, 1);
    drawSomeWidget(101, 1);

    return function () {
        alert("hi!");
    };
}

Now, as init has finished its call, someLargeVariable is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget has been optimized away (inlined?). If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.


Answer to Raynos comment below. I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:

Screenshot of Chrome debugger When the inner function is being executed, someLargeVariable is still in scope.

If I comment out the reference to someLargeVariable in the inner drawSomeWidget method, then you get a different result:

Screenshot of Chrome debugger 2 Now someLargeVariable is not in scope, because it could be allocated on the stack.

Upvotes: 10

Mike Samuel
Mike Samuel

Reputation: 120526

if the javascript engine uses a reference counting method

Most javascript engine's use some variant of a compacting mark and sweep garbage collector, not a simple reference counting GC, so reference cycles do not cause problems.

They also tend to do some tricks so that cycles that involve DOM nodes (which are reference counted by the browser outside the JavaScript heap) don't introduce uncollectible cycles. The XPCOM cycle collector does this for Firefox.

The cycle collector spends most of its time accumulating (and forgetting about) pointers to XPCOM objects that might be involved in garbage cycles. This is the idle stage of the collector's operation, in which special variants of nsAutoRefCnt register and unregister themselves very rapidly with the collector, as they pass through a "suspicious" refcount event (from N+1 to N, for nonzero N).

Periodically the collector wakes up and examines any suspicious pointers that have been sitting in its buffer for a while. This is the scanning stage of the collector's operation. In this stage the collector repeatedly asks each candidate for a singleton cycle-collection helper class, and if that helper exists, the collector asks the helper to describe the candidate's (owned) children. This way the collector builds a picture of the ownership subgraph reachable from suspicious objects.

If the collector finds a group of objects that all refer back to one another, and establishes that the objects' reference counts are all accounted for by internal pointers within the group, it considers that group cyclical garbage, which it then attempts to free. This is the unlinking stage of the collectors operation. In this stage the collector walks through the garbage objects it has found, again consulting with their helper objects, asking the helper objects to "unlink" each object from its immediate children.

Note that the collector also knows how to walk through the JS heap, and can locate ownership cycles that pass in and out of it.

EcmaScript harmony is likely to include ephemerons as well to provide weakly held references.

You might find "The future of XPCOM memory management" interesting.

Upvotes: 0

Related Questions