Reputation: 15788
Suppose I allocation some large object (e.g. a vector of size N, which might be very large) and perform a sequence of m operations on it:
fm( .. f3( f2( f1( vec ) ) ) )
with each returning a collection of size N.
For simplicity let's assume each f is quite simple
def f5(vec: Vector[Int]) = { gc(); f6( vec.map(_+1) ) }
So, vec no longer has future references at the point where each subsequent call is made. (f1's vec parameter is never used after f2 is entered, and so forth for each call)
However, because most JVMs don't decrement references until the stack unwinds (AFAIK), isn't my program required to consume NxM memory. By comparison in the following style only 2xM is required (and less in other implementations)
var vec:Vector[Int] = ...
for ( f <- F ) {
vec = f(vec)
gc()
}
Does the same issue exist for tail recursive methods?
This isn't just an academic exercise - in some types of big-data type problems, we might to choose N so that our program is fits fully into RAM. In this case, should I be concerned that one style of pipelining is preferable to another?
Upvotes: 1
Views: 208
Reputation: 369458
The calls in your example are tail calls. They really shouldn't have a stack frame allocated at all. However, for various unfortunate reasons, the Scala Language Specification does not mandate Proper Tail Calls, and for similarly unfortunate reasons, the Scala-JVM implementation does not perform Tail Call Optimization.
However, some JVMs have TCO, e.g. the J9 JVM performs TCO, and thus there shouldn't be any additional stack frames allocated, making the intermediate objects unreachable as soon as the next tail call happens. Even JVMs that do not have TCO are able to perform various static (escape analysis, liveness analysis) or dynamic (escape detection, e.g. the Azul Zing JVM does this) analysis that may or may not help with this case.
There are also other implementations of Scala: Scala.js does not perform TCO, as far as I know, but it compiles to ECMAScript, and as of ECMAScript 2015, ECMAScript does have Proper Tail Calls, so as long as the encoding of Scala method calls ends up as ECMAScript function calls, an standards-conforming ECMAScript 2015 engine should eliminate Scala tail calls.
Scala-native currently does not perform TCO, but it will in the future.
Upvotes: 0
Reputation: 43052
To add to @StephenC's answer
I don't know if HotSpot JITs / GCs can do that.
The hotspot jit can do liveness analysis within a method and deem local variables as unreachable even while a frame is still on the stack. This is why JDK9 introduces Reference.reachabilityFence, under some conditions even this
can become unreachable while executing a member method of that instance.
But that optimization only applies when there really nothing in the control flow that can still read that local variable, e.g. no finally blocks or monitor exits. So it would depend on the bytecode generated by scala.
Upvotes: 2
Reputation: 718816
First of all, your question contains a serious misconception, and an example of disastrously bad coding.
However, because most JVMs don't decrement references until the stack unwinds (AFAIK) ...
Actually there are no mainstream JVMs that use reference counting on references at all. Instead, they all use mark-sweep, copying or generational collection algorithms of some kind that do not rely on reference counting.
Next this:
def f5(vec: Vector[Int]) = { gc(); f6( vec.map(_+1) ) }
I think you are trying to "force" a garbage collection with the gc()
call. Don't do this: it is horribly inefficient. And even if you are only doing to investigate memory management behavior, you are most likely distorting that behavior to the extent that what you are seeing is NOT representative of normal Scala code.
Having said that, the answer is basically yes. If your Scala function cannot be tail-call optimized, then there is the potential for a deep recursion to cause garbage retention problems. The only "get out" would be if the JIT compiler was able to tell the GC that certain variables were "dead" at particular points in a method call. I don't know if HotSpot JITs / GCs can do that.
(I guess, another way to do that would be for the Scala compiler to explicitly assign null
to dead reference variables. But that has potential performance issues when you don't have a garbage retention problem!)
Upvotes: 2