Reputation: 97
I read https://www.baeldung.com/java-lambda-effectively-final-local-variables and many articles(stackoverflow) However, there are many unanswered questions.
Supplier<Integer> incrementer(int start) {
return () -> start++;
}
// start is a local variable, and we are trying to modify it inside of a lambda expression.
they says
Well, notice that we are returning the lambda from our method. Thus, the lambda won't get run until after the start method parameter gets garbage collected. Java has to make a copy of start in order for this lambda to live outside of this method.
start
variable's life cycle is incrementer().
They both exist on the same stack and have a lifecycle together. But I don't understand why it says GC and it doesn't run.
Concurrency Issues
.Since the stack is allocated for each thread, there can be no concurrency issues. Rather, why do local variables need to be final when static member variables can cause concurrency problems?
Upvotes: 1
Views: 1810
Reputation: 8013
Capturing of the variable has absolutely nothing to do with the concurrent execution or its safety, the reason is completely different.
Before I answer your questions, let me first explain what is a lambda expression.
When you use lambda expression, there are a few things happening, both during compilation and runtime, that are hidden from the developer. It's also worth nothing that lambda expression is part of the java language, it doesn't exist in the generated bytecode.
I'll use following code as an example
public class GreeterFactory {
private String header = "Hello ";
public Function<String, String> createGreeter(int greeterId){
Function<String, String> greeter = username -> {
return String.format("(%s) %s: %s", greeterId, header, username);
};
return greeter;
}
}
When javac compiles java into bytecode, it'll convert your lambda's body into new method in the embedding class (that's why lambda expressions can be though of as anonymous methods).
Here's what will be in the bytecode (decompiled with javap tool):
Compiled from "GreeterFactory.java"
public class various.GreeterFactory {
private java.lang.String header;
public various.GreeterFactory();
public java.util.function.Function<java.lang.String, java.lang.String> createGreeter(int);
private java.lang.String lambda$createGreeter$0(int, java.lang.String);
}
As you can see the GreeterFactory
class not only has the createGreeter
method that I've written. It will also now have lambda$createGreeter$0
method that was generated by the compiler.
One thing that you may notice here is that generated method has two parameters (int and String) even though in my lambda I declared only one parameter - String. The reason for this is because in the runtime this method will be called not only with the arguments that I pass (when I execute apply
method form Function
interface), but also all the "captured" values. Which gets us to point 2:
We already know that lambda is converted into actual method, now the question is: what exactly am I getting as the result from the execution of that lamda expression (beside the fact that it's something implementing Function
interface)?
The Function<String, String> greeter
variable will actually point to an object that internally:
this
GreeterFactory
object (so that it can later call method on it)greeterId
)lambda$createGreeter$0
methodYou can see it when you inspect that object in the debugger. Here's what you'll see:
Notice that greeter
object has exactly those two values that I mentioned (reference to this
GreeterFactory
object and a value 23
that was copied from greeterId
).
That's exactly what "capturing" means in case of lambda expression.
Later when apply
is executed on this object, it'll actually call lambda$createGreeter$0
method on the this
GreeterFactory
object with all captured values + arguments that you pass into apply
method.
I hope I already explained above what "capturing" is and how it works. Let's get to point of final/effectively final.
disclaimer: I didn't find any official information about it, it's just my assumption, therefore: I may be wrong.
Notice that lambdas exist only on java language level, not on bytecode. Having explained how lambdas work (generation of new method) I think it would be technically possible to capture non-effectively-final variables as well.
I think the reason why designers of lambda expression chose this way is rather focused on helping developers write a bug-free code.
If captured variables where non-effectively-final, meaning: they could be further modified outside of lambda as well as within lambda, that could lead to many confusion and misunderstandings from developers point of view, effectively leading to many bugs. I.e. devs could expect that changing variable's value within lambda should affect this variable in scope of outer method (that's because it's not visible in language that within body of lambda we are actually in scope of that newly generated method), or they could expect the opposite. In short: a total chaos.
I think that's the reason behind such decision and that's why compiler and language enforce it, i.e. by treating lambda's scope and embedding method scope as one (even though in runtime those are different scopes).
Notice that previously the same was true for variables captured by anonymous classes, therefore developers are already familiar with such approach.
Why lambda can freely modify fields in the object? Because it's just a method within the class of this object and as any other method, it has free access to all its members. It would be confusing to expect different behavior.
Upvotes: 8
Reputation: 103263
They both exist on the same stack and have a lifecycle together.
No they don't.
Here:
public class OhDearThatWasALieWasntIt {
void haha() throws Exception {
var supplier = incrementer(20);
Thread t = new Thread() {
public void run() {
supplier.get();
}
}
}
}
There you go. They don't share a stack at all. Your incrementer
local var needs to travel all the way from one thread to an entirely different one, in fact.
The simple fact is, the compiler has no idea where that lambda is going to end up and who shall run it.
Since the stack is allocated for each thread, there can be no concurrency issues.
Baeldung oversimplified, perhaps. If a local var used in a lambda is not final, then there are only 2 options:
[A] the lambda gets a clone and this is incredibly confusing.
[B] the variable is hoisted into heap and we now allow volatile
on local vars; the maxim that local vars cannot possibly be shared with other threads is left by the wayside, and concurrency issues abound.
Let's see this in action:
void meanCode() {
int local = 100;
Runnable r = () -> {
for (int i = 0; i < 10; i++) {
System.out.println(local++);
}
};
Thread a = new Thread(r);
a.start();
Thread.sleep(5);
for (int i = 0; i < 10; i++) {
System.out.println(local++);
}
}
Either local
is now a variable used in 2 places and thus the above code is a race condition, or, a clone is handed out, and both the Runnable and the for loop at the end of the above snippet get their own local copy of local
, thus race-condition free, printing 100
through 109
in order, but both print runs arbitrarily interleaved (I guess there's a bit of race condition left). The fact that you secretly have 2 variables is incredibly confusing.
Given that both options are utterly confusing, java instead just doesn't allow it at all. With (effectively) final variables, java gets to just give a copy to the lambda, thus neatly sidestepping any concurrency issues. It's also not confusing, as the variable is (effectively) final.
Yeah you know that. How could the compiler possibly know that? The compiler (and runtime) work on single classes at a time. The compiler isn't going to 'treeshake' your entire project to painstakingly ensure that your code never ends up in a scenario where this stuff ends up in multiple threads. Even if somehow it did, perhaps later on someone recompiles half this code base, or just adds on a few more classes that now do.
Upvotes: 0