Reputation: 2163

Perl scoping - accessing variables in subroutine

I am doing some code golf, and decided to try and be 'smart' and declare a subroutine where it would have the variables it needs already in scope, to avoid the extra code of having to pass in the arguments:

#! perl

use strict;
use warnings;


for my $i(0..1) {
  my @aTest = (1);

  sub foo {
    # first time round - @aTest is (1)
    # second time round - @aTest is (1,2)

    push @aTest, 2;

    # first time round - @aTest is (1,2)
    # second time round - @aTest is (1,2,2)

    my $unused = 0;
  }

  foo();
}

foo sees the variable @aTest, and it has the expected value of (1) the first time I enter foo, before it pushes 2 onto the array as well. @aTest now looks like (1,2).

So far so good.

Then we exit foo, and commence the second round of the for loop. @aTest is reassigned to (1) again.

We enter foo for the second time, but @aTest retains the value it previously had in foo (i.e. (1,2)), and then pushes another 2 on to become (1,2,2).

What's going on here?

I assumed that since @aTest was in the same scope, it would be referring to the same variable both insed and outside foo. So how is it that inside `foo it retains its old value?

Upvotes: 3

Answers (3)

ikegami

Reputation: 386696

Before the explanation, the lesson to learn here is to not place named subs inside of loops or other subs. (Anonymous subs are fine.)

Simplified example:

for (1..3) {
   my $var = $_;
   sub foo { say $var; }
   foo();
}

Output:

1
1
1

for (1..3) {
   my $var = $_;
   sub foo { say $var; }
   foo();
}

is equivalent to

for (1..3) {
   my $var = $_;
   BEGIN { *foo = sub { say $var; }; }
   foo();
}

As this makes clearer, $var is captured at compile-time.

But how can $var exist at compile-time? Doesn't it get created every pass of the loop? No. Something you have to realize is that my creates the variable at compile-time. At run-time, it merely places a directive on the stack that causes the variable to be cleared when the directive is popped from the stack. This allows the same $var to be used every pass of the loop, which is great because allocating and deallocating scalars is rather expensive.

Now, if what I said what true, then the following would print 3 three times because @a would contain three reference to the same variable:

my @a;
for (1..3) {
   my $x = $_;
   push @a, \$x;
}

say $$_ for @a;

However, it prints 1, 2 and 3 as expected. Remember that directive my places on the stack? It's smarter than I mentioned. If the variable contains an object, or if the variable is still being referenced by something other than the file/sub it's in (e.g. when it's been captured), then it's replaced with a fresh variable instead of being cleared.

What this means,

                         First pass            Second pass           Third pass
for (1..3) {             --------------------  --------------------  --------------------
  my $var = $_;          Orig $var assigned 1  New $var assigned 2   Same $var assigned 3
  say \$var;             SCALAR(0x996da8)  !=  SCALAR(0x959b78)  ==  SCALAR(0x959b78)
  sub foo { say $var; }  Prints captured $var  Prints captured $var  Prints captured $var
  foo();
}                        $var is replaced      $var is cleared       $var is cleared
                         because REFCNT=2      because REFCNT=1      because REFCNT=1

In contrast, try

for (1..3) {
   my $var = $_;
   my $foo = sub { say $var; };   # Captures $var at runtime.
   $foo->();
}

Output:

1
2
3

Upvotes: 4

simbabque

Reputation: 54381

I posted this question into #p5p on irc.perl.org and got an interesting exchange that explains what's going on.

[15:13:35] <simbabque> can someone explain what's going on in Perl scoping - accessing variables in subroutine ? I've tried to read the output of B::Concise for that program, but my understanding is not strong enough. Could the behaviour we see there be a bug?
[15:15:35] <rjbs> &foo is a closure over @aTest.
[15:16:33] <haarg> but only the first @aTest, because &foo is created once at compile time
[15:17:01] <rjbs> Right.
[15:18:09] <rjbs> declaring a named sub inside of anything other than a package or bare block is, in my experience, asking for future heartache
[15:18:23] <alh> If you had my $foo = sub { }; $foo->(); it would work as expected
[15:18:34] <alh> Or in newer perls, with use feature qw(lexical_subs); my sub foo { } foo() would also work
[15:18:35] <simbabque> well the guy said he came across it while golfing
[15:19:13] <simbabque> alh: with both of those I had also expected it to be a closure, but because the sub foo {} gets done at compile time I was confused [15:19:38] <rjbs> lexical subs "do the right thing" with regard to binding [15:19:45] <alh> Still a closure, just re-evaluted every time [15:20:56] <haarg> they share the op tree, but are bound to different variables
[15:23:29] <simbabque> I added say "foo: ".\@aTest; and say "out ".\@aTest; inside and outside the function. That's also weird. the first round both are the same, then foo keeps the same one, and the one in the loop gets a new address and keeps it in subsequent iterations
[15:26:48] <alh> Sure, the first run through the loop, the variable that's closed over in the sub is the same one the loop sees
[15:27:01] <alh> Then we loop, and get a brand new one, but the sub doesn't (because it's not compiled again)
[15:27:46] <simbabque> alh: that makes sense, but why do all subsequent iterations reuse the same variable but reset it in the loop? Is that just Perl being smart with its memory?
[15:28:48] <alh> No, your sub has closed over a variable and maintains a reference to it - so it never goes away and its value is maintained across sub calls
[15:29:07] <alh> There's no "my @aTest" in the sub to "reset" the variable
[15:29:19] <alh> So it just keeps its value - that's the point of a closure
[15:29:21] <simbabque> alh: I meant the one in the loop, not in the sub
[15:29:41] <alh> What's happening to the one in the loop that is surprising you?
[15:29:52] <simbabque> out: ARRAY(0x2356300) foo: ARRAY(0x2356300) out: ARRAY(0x20978b0) foo: ARRAY(0x2356300) out: ARRAY(0x20978b0) foo: ARRAY(0x2356300) out: ARRAY(0x20978b0) foo: ARRAY(0x2356300)
[15:30:13] <simbabque> the first round they are the same
[15:30:33] <simbabque> then out (that's the my @aTest in the loop) gets a new address, but the round after that it stays the same
[15:30:49] <haarg> the address stays the same
[15:30:54] <haarg> that doesn't mean it's the same variable
[15:31:02] <haarg> more "gets the same address again"
[15:31:12] <simbabque> so Perl just reuses that address?

The output I am referring to is from this modification:

for my $i(0..3) {
  my @aTest = (1);

  sub foo {
    push @aTest, 2;
    my $unused = 0;
    print " foo: ".\@aTest;
  }
    print " out: ".\@aTest;

  foo();
}

So it's essentially building a closure over the @aTest at compile time. In the first iteration, the variable in the loop is the same as in the sub. In all subsequent iterations it creates a new variable in the loop, so we see a fresh (1) every time. But the sub does not get compiled again, so the @aTest variable in there stays the same and grows.

Upvotes: 2

Shaka Flex

Reputation: 111

You nested the subroutine so it will not behave the way you expected.

Subroutines are stored in a global namespace at compile time. In your example b(); is short hand for main::b();. To limit visibility of a function to a scope you need to assign an anonymous subroutines to a variable.

Both named and anonymous subroutines can form closures, but since named subroutines are only compiled once if you nest them they don't behave as many people expect.

Please read the rest here.

Upvotes: 1

Perl scoping - accessing variables in subroutine

Answers (3)

Related Questions