Evan Carroll
Evan Carroll

Reputation: 1

Why does Perl allow functions to be declared without being defined?

Perl has a notion of an undefined function. A function that's declared but not defined.

sub foo;
foo(); # Undefined subroutine error
bar(); # Undefined subroutine error

This function foo exists now in the symbol table and it can be used to resolve a method call. But why does this "feature" even exist? In C, it's because functions are type-checked and sometimes you want to have a call before you define (such as to resolve a circular dependency). But Perl has no such feature and all function symbols are resolved in runtime not compile time.

Upvotes: 22

Views: 2715

Answers (5)

tchrist
tchrist

Reputation: 80384

Because Perl is a one-pass compiler

The reason why "Perl has a notion of an undefined function" is because it's a one-pass compiler. All else follows from this simple principle. That's why:

printf "it is '%s'\n", some_function();

is a syntactically legal statement as far as the compiler is concerned. This is easily verified via perl -c to just compile but not run the code:

% perl -ce 'printf "it is '%s'\n", some_function()'
-e syntax OK

Sure, if you tried to run that, the interpreter will die because you tried to call an undefined subroutine, but that's not really the compiler's business. For further insight, you should examine the compiler's resulting parse tree using the B::Concise module:

% perl -MO=Concise,-exec -e 'printf "it is '%s'\n", some_function()'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <0> pushmark sM
4  <$> const(PV "it is %s\n") sM
5  <0> pushmark s
6  <$> gv(*some_function) s/EARLYCV
7  <1> entersub[t2] lKMS/LVINTRO,TARG,INARGS
8  <@> prtf vK
9  <@> leave[1 ref] vKP/REFC
-e syntax OK

Look specifically at opcode 6: gv(*some_function) s/EARLYCV. That's telling you this was a coderef that was used before the compiler saw a definition for it.

The very same parse tree is obtained by placing the subroutine definition after the code that calls it:

% perl -MO=Concise,-exec -e 'printf "it is '%s'\n", some_function(); sub some_function { time }'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <0> pushmark sM
4  <$> const(PV "it is %s\n") sM
5  <0> pushmark s
6  <$> gv(*some_function) s/EARLYCV
7  <1> entersub[t2] lKMS/LVINTRO,TARG,INARGS
8  <@> prtf vK
9  <@> leave[1 ref] vKP/REFC
-e syntax OK

This is quite different when the compiler already knows what coderef that name is bound to at compile time, which you can effect by placing the definition before the code that calls it:

% perl -MO=Concise,-exec -e 'sub some_function { time } printf "it is '%s'\n", some_function()'
1  <0> enter 
2  <;> nextstate(main 3 -e:1) v:{
3  <0> pushmark sM
4  <$> const(PV "it is %s\n") sM
5  <0> pushmark s
6  <$> gv(IV \&main::some_function) s
7  <1> entersub lKMS/LVINTRO,INARGS
8  <@> prtf vK
9  <@> leave[1 ref] vKP/REFC
-e syntax OK

Now look what has happened to opcode 6! It has become gv(IV \&main::some_function) s. Now the interpreter won't have to look that coderef up at runtime. The compiler has already provided it.

If you declare the function before the compiler sees you use it, it still can't know what coderef that resolves to until runtime.

% perl -MO=Concise,-exec -e 'sub some_function; printf "it is '%s'\n", some_function(); sub some_function { time }'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <0> pushmark sM
4  <$> const(PV "it is %s\n") sM
5  <0> pushmark s
6  <$> gv(*some_function) s
7  <1> entersub[t2] lKMS/LVINTRO,TARG,INARGS
8  <@> prtf vK
9  <@> leave[1 ref] vKP/REFC
-e syntax OK

Now opcode 6 reads gv(*some_function) s, because the interpreter still has to look it up in the package symbol table to find the coderef. The compiler wasn't able to provide the coderef's address to the interpreter.

You might find this surprising, given that you yourself can clearly see the function definition yourself later on. But the compiler cannot.

Why not?

It's what I said at the beginning: because Perl is a one-pass compiler, that's why. That's the answer to your question.

All discussions about function prototypes, AUTOLOAD intercepts, and method resolution are distractions that get lost in the weeds. They describe several interesting ramifications that follow naturally from this initial principle. While these are all perfectly valid — and valuable — observations, they ultimately fail to answer your question because they do not identify the unitary cause behind it all: the single-pass nature of the Perl compiler.

Upvotes: 6

hobbs
hobbs

Reputation: 239781

It is about prototypes, and the declaration (or not) of a function does have an effect at compile time. Consider

  1.  print foo + 42;
    

    In isolation, this is equivalent to print('foo' + 42);foo is a "bareword". If you have strict 'subs' enabled, it will instead give you a compilation error saying that barewords are forbidden.

  2.  sub foo;
     print foo + 42;
    

    This is equivalent to print(foo(42)); the compiler knows that foo is a sub and it has no prototype, so it consumes everything after it in "list op" fashion, and what follows it is the term +42.

  3.  sub foo();
     print foo + 42;
    

    This is equivalent to print(foo() + 42); the compiler knows that foo has a prototype and that it takes no arguments, so none will be looked for, and foo and 42 will be the operands of the + operator.

  4.  sub foo($);
     print foo + 42;
    

    Like case 2 this is equivalent to print(foo(42)). I think there's probably a test I could have used to distinguish them.

Point being, whether a sub is known or not does have effects at compile-time, and Perl gives you the option to declare that fact before you define the body of the sub, rarely needed as it may be.

As for why it has an impact on method resolution order — most likely it's a side-effect, but it's not wrong. Forward-declaring a sub is supposed to mean that you intend to intend to provide the definition before compilation is done. If you don't, then you will get a runtime error when you try to call it. It seems fair enough to me that if such a declaration is in a package in the MRO, then that means "there should be a method here, but I forgot", and you get an error when the MRO reaches that package.

Upvotes: 31

zdim
zdim

Reputation: 66873

It's the same as with other types, I'd say; just the artifact of how it is parsed.

my $hr = { a => 1 };   # $hr name introduced at compile time, assigned or not 

So the same goes with

sub name { ... };      # "name" "declared" at compile time

and saying just sub name; is about the same as saying my $hr; -- and then having a symbol with no definition attached to it.

I don't know how the parser works but I'd guess that it has to take sub name first and "bind" the definition later, so by happenstance we can then also say just sub name; and have that name.


I mean to say that this is the "reason", per the question "But why does this "feature" even exist?"

But then once it is known to the compiler ahead of time that there is a sub with that name then there may be various uses of that fact.

Upvotes: 5

Todd Merriman
Todd Merriman

Reputation: 117

This has saved me a lot of debugging:

sub AUTOLOAD
{
   my (undef,$filename,$lineno) = caller;
   my ($fn) = basename($filename);

   logmsg('E',"Undefined reference ($fn/$lineno): ref=$AUTOLOAD, refer=$ENV{HTTP_REFERER}");
}

Upvotes: 3

ikegami
ikegami

Reputation: 385645

I can think of these reasons:

  • Allows placing the definition of a sub after a call to it if they have a prototype or attributes.

    Notable specifics:

    • Allows mutually recursive subs to exist if they have a prototype or attributes.
  • Allows AUTOLOADed and similar subs to be declared.

    Notable specifics:

    • Allows ones to check if AUTOLOADed subs exist.
    • Allows AUTOLOADed and similar subs to be exported.
  • Can be used as an abstract method.

Upvotes: 2

Related Questions