paiwmsn
paiwmsn

Reputation: 97

Curious about lambda capture in Java

I read https://www.baeldung.com/java-lambda-effectively-final-local-variables and many articles(stackoverflow) However, there are many unanswered questions.

  1. I don't know why you do capture (copy value) in lambda. The following code is part of the code from the link I attached.
Supplier<Integer> incrementer(int start) {
  return () -> start++;
}
// start is a local variable, and we are trying to modify it inside of a lambda expression.

they says

Well, notice that we are returning the lambda from our method. Thus, the lambda won't get run until after the start method parameter gets garbage collected. Java has to make a copy of start in order for this lambda to live outside of this method.

start variable's life cycle is incrementer(). They both exist on the same stack and have a lifecycle together. But I don't understand why it says GC and it doesn't run.

  1. why only local variable must final or effective final? they(baeldung) say It because Concurrency Issues.

Since the stack is allocated for each thread, there can be no concurrency issues. Rather, why do local variables need to be final when static member variables can cause concurrency problems?

Upvotes: 1

Views: 1810

Answers (2)

Mirek Pluta
Mirek Pluta

Reputation: 8013

Capturing of the variable has absolutely nothing to do with the concurrent execution or its safety, the reason is completely different.

Before I answer your questions, let me first explain what is a lambda expression.

What is lambda expression

When you use lambda expression, there are a few things happening, both during compilation and runtime, that are hidden from the developer. It's also worth nothing that lambda expression is part of the java language, it doesn't exist in the generated bytecode.

I'll use following code as an example

public class GreeterFactory {

    private String header = "Hello ";
    
    public Function<String, String> createGreeter(int greeterId){
        Function<String, String> greeter = username -> {
            return String.format("(%s) %s: %s", greeterId, header, username);
        };
        
        return greeter;
    }
}

lamba expression is compiled into anonymous method

When javac compiles java into bytecode, it'll convert your lambda's body into new method in the embedding class (that's why lambda expressions can be though of as anonymous methods).

Here's what will be in the bytecode (decompiled with javap tool):

Compiled from "GreeterFactory.java"
public class various.GreeterFactory {
  private java.lang.String header;
  public various.GreeterFactory();
  public java.util.function.Function<java.lang.String, java.lang.String> createGreeter(int);
  private java.lang.String lambda$createGreeter$0(int, java.lang.String);
}

As you can see the GreeterFactory class not only has the createGreeter method that I've written. It will also now have lambda$createGreeter$0 method that was generated by the compiler.

One thing that you may notice here is that generated method has two parameters (int and String) even though in my lambda I declared only one parameter - String. The reason for this is because in the runtime this method will be called not only with the arguments that I pass (when I execute apply method form Function interface), but also all the "captured" values. Which gets us to point 2:

Lambda expression in runtime

We already know that lambda is converted into actual method, now the question is: what exactly am I getting as the result from the execution of that lamda expression (beside the fact that it's something implementing Function interface)?

The Function<String, String> greeter variable will actually point to an object that internally:

  • has reference to this GreeterFactory object (so that it can later call method on it)
  • holds all (used in the body of lambda expression) "captured" local variables (in my example: value of greeterId)
  • has reference to the generated lambda$createGreeter$0 method

You can see it when you inspect that object in the debugger. Here's what you'll see: enter image description here

Notice that greeter object has exactly those two values that I mentioned (reference to this GreeterFactory object and a value 23 that was copied from greeterId). That's exactly what "capturing" means in case of lambda expression.

Later when apply is executed on this object, it'll actually call lambda$createGreeter$0 method on the this GreeterFactory object with all captured values + arguments that you pass into apply method.

Back to questions

I hope I already explained above what "capturing" is and how it works. Let's get to point of final/effectively final.

Why captured variables must be effectively final.

disclaimer: I didn't find any official information about it, it's just my assumption, therefore: I may be wrong.

Notice that lambdas exist only on java language level, not on bytecode. Having explained how lambdas work (generation of new method) I think it would be technically possible to capture non-effectively-final variables as well.

I think the reason why designers of lambda expression chose this way is rather focused on helping developers write a bug-free code.

If captured variables where non-effectively-final, meaning: they could be further modified outside of lambda as well as within lambda, that could lead to many confusion and misunderstandings from developers point of view, effectively leading to many bugs. I.e. devs could expect that changing variable's value within lambda should affect this variable in scope of outer method (that's because it's not visible in language that within body of lambda we are actually in scope of that newly generated method), or they could expect the opposite. In short: a total chaos.

I think that's the reason behind such decision and that's why compiler and language enforce it, i.e. by treating lambda's scope and embedding method scope as one (even though in runtime those are different scopes).

Notice that previously the same was true for variables captured by anonymous classes, therefore developers are already familiar with such approach.

Why lambda can freely modify fields in the object? Because it's just a method within the class of this object and as any other method, it has free access to all its members. It would be confusing to expect different behavior.

Upvotes: 8

rzwitserloot
rzwitserloot

Reputation: 103263

They both exist on the same stack and have a lifecycle together.

No they don't.

Here:

public class OhDearThatWasALieWasntIt {
   void haha() throws Exception {
     var supplier = incrementer(20);
     Thread t = new Thread() {
       public void run() {
         supplier.get();
       }
     }
   }
}

There you go. They don't share a stack at all. Your incrementer local var needs to travel all the way from one thread to an entirely different one, in fact.

The simple fact is, the compiler has no idea where that lambda is going to end up and who shall run it.

Since the stack is allocated for each thread, there can be no concurrency issues.

Baeldung oversimplified, perhaps. If a local var used in a lambda is not final, then there are only 2 options:

[A] the lambda gets a clone and this is incredibly confusing.

[B] the variable is hoisted into heap and we now allow volatile on local vars; the maxim that local vars cannot possibly be shared with other threads is left by the wayside, and concurrency issues abound.

Let's see this in action:

void meanCode() {
  int local = 100;
  Runnable r = () -> {
    for (int i = 0; i < 10; i++) { 
      System.out.println(local++);
    }
  };

  Thread a = new Thread(r);
  a.start();
  Thread.sleep(5);
  for (int i = 0; i < 10; i++) { 
    System.out.println(local++);
  }
}

Either local is now a variable used in 2 places and thus the above code is a race condition, or, a clone is handed out, and both the Runnable and the for loop at the end of the above snippet get their own local copy of local, thus race-condition free, printing 100 through 109 in order, but both print runs arbitrarily interleaved (I guess there's a bit of race condition left). The fact that you secretly have 2 variables is incredibly confusing.

Given that both options are utterly confusing, java instead just doesn't allow it at all. With (effectively) final variables, java gets to just give a copy to the lambda, thus neatly sidestepping any concurrency issues. It's also not confusing, as the variable is (effectively) final.

But there are no threads here!

Yeah you know that. How could the compiler possibly know that? The compiler (and runtime) work on single classes at a time. The compiler isn't going to 'treeshake' your entire project to painstakingly ensure that your code never ends up in a scenario where this stuff ends up in multiple threads. Even if somehow it did, perhaps later on someone recompiles half this code base, or just adds on a few more classes that now do.

Upvotes: 0

Related Questions