Reputation: 12405
as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment(LE) and a VariableEnvironment(VE), for function code, these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment(ECMAScript v5 10.4.3), and all variables declared in function code are stored in the environment record componentof VariableEnvironment(ECMAScript v5 10.5), and this is the basic concept for closure.
What confused me is how Garbage Collect works with this closure approach, suppose I have code like:
function f1() {
var o = LargeObject.fromSize('10MB');
return function() {
// here never uses o
return 'Hello world';
}
}
var f2 = f1();
after the line var f2 = f1()
, our object graph would be:
global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o
so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o
has at lease 1 refenrence and would never be GCed. Appearently this would result a waste of memory since o
would never be used but is always stored in memory.
Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment, so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:
function f1() {
var o1 = LargeObject.fromSize('10MB');
var o2 = LargeObject.fromSize('10MB');
return function() {
alert(o1);
}
}
var f2 = f1();
in this case, f2
uses the o1
object which stores in f1's VariableEnvironment, so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment, which result that o2
cannot be GCed as well, which further result in a waste of memory.
so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.
Thanks.
Upvotes: 15
Views: 1121
Reputation: 11
There is no standard specifications of implementation for GC, every engine have their own implementation. I know a little concept of v8, it has a very impressive garbage collector (stop-the-world, generational, accurate). As above example 2, the v8 engine has following step:
On parse function literal, it create FunctionBody. Only parse FunctionBody when function was called.The following code indicate it not throw error while parser time
function p(){
return function(){alert(a)}
}
p();
So at GC time H1, H2 will be swept, because no reference point that.In my mind if the code is lazily compiled, no way to indicate o1 variable declared in a1 is a reference to f1, It use JIT.
Upvotes: 1
Reputation: 18783
tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced.". In your second example, o2
can be allocated on the stack and is thrown away after f1
exits.
I don't think they can handle it. At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:
function outer(node) {
node.onclick = function inner() {
// some code not referencing "node"
};
}
where inner
closes over node
, forming a circular reference inner -> outer's VariableContext -> node -> inner
, which will never be freed in for instance IE6, even if the DOM node is removed from the document. Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem. But now I digress from the subject.
A common way to break the circular reference is to null out all unnecessary variables at the end of outer
. I.e., set node = null
. The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner
?
I think the answer is no, but I can be proven wrong. The reason is that the following code executes just fine:
function get_inner_function() {
var x = "very big object";
var y = "another big object";
return function inner(varName) {
alert(eval(varName));
};
}
func = get_inner_function();
func("x");
func("y");
See for yourself using this jsfiddle example. There are no references to either x
or y
inside inner
, but they are still accessible using eval
. (Amazingly, if you alias eval
to something else, say myeval
, and call myeval
, you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.)
Edit: As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more. I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8. It also specifically touches on the eval
problem. It seems that it has to parse the code in all inner functions. and see what variables are referenced, or if eval
is used, and then determine whether each variable should be allocated on the heap or on the stack. Pretty neat. Here is another blog that contains a lot of details on the ECMAScript implementation.
This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap. E.g.:
function init(node) {
var someLargeVariable = "...";
function drawSomeWidget(x, y) {
library.draw(x, y, someLargeVariable);
}
drawSomeWidget(1, 1);
drawSomeWidget(101, 1);
return function () {
alert("hi!");
};
}
Now, as init
has finished its call, someLargeVariable
is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget
has been optimized away (inlined?). If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.
Answer to Raynos comment below. I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:
When the inner function is being executed, someLargeVariable is still in scope.
If I comment out the reference to someLargeVariable
in the inner drawSomeWidget
method, then you get a different result:
Now
someLargeVariable
is not in scope, because it could be allocated on the stack.
Upvotes: 10
Reputation: 120526
if the javascript engine uses a reference counting method
Most javascript engine's use some variant of a compacting mark and sweep garbage collector, not a simple reference counting GC, so reference cycles do not cause problems.
They also tend to do some tricks so that cycles that involve DOM nodes (which are reference counted by the browser outside the JavaScript heap) don't introduce uncollectible cycles. The XPCOM cycle collector does this for Firefox.
The cycle collector spends most of its time accumulating (and forgetting about) pointers to XPCOM objects that might be involved in garbage cycles. This is the idle stage of the collector's operation, in which special variants of
nsAutoRefCnt
register and unregister themselves very rapidly with the collector, as they pass through a "suspicious" refcount event (from N+1 to N, for nonzero N).Periodically the collector wakes up and examines any suspicious pointers that have been sitting in its buffer for a while. This is the scanning stage of the collector's operation. In this stage the collector repeatedly asks each candidate for a singleton cycle-collection helper class, and if that helper exists, the collector asks the helper to describe the candidate's (owned) children. This way the collector builds a picture of the ownership subgraph reachable from suspicious objects.
If the collector finds a group of objects that all refer back to one another, and establishes that the objects' reference counts are all accounted for by internal pointers within the group, it considers that group cyclical garbage, which it then attempts to free. This is the unlinking stage of the collectors operation. In this stage the collector walks through the garbage objects it has found, again consulting with their helper objects, asking the helper objects to "unlink" each object from its immediate children.
Note that the collector also knows how to walk through the JS heap, and can locate ownership cycles that pass in and out of it.
EcmaScript harmony is likely to include ephemerons as well to provide weakly held references.
You might find "The future of XPCOM memory management" interesting.
Upvotes: 0