Ed Avis
Ed Avis

Reputation: 1482

Approximately syntax checking Perl code, faster than perl -c

Is there a way to syntax check a Perl program without running perl? The well-known answer is 'no'. Without starting the full perl runtime to evaluate code for imports, etc, you cannot tell whether program syntax is correct.

But what about if you wanted an approximate answer? A syntax checker that will say either 'bad' or 'maybe'. If 'bad' then the program is definitely not valid perl code (assuming a vanilla perl interpreter). If 'maybe' then it looks OK but only perl itself will be able to say for sure.

A program which always prints 'maybe' is clearly such a checker, but not a very useful one. A better attempt is to use PPI. There may be some valid Perl programs which are rejected by PPI, but if that occurs it is accepted as a PPI bug (I think).

Digression: Why is this useful? One use might be a kwalitee check. To catch various "d'oh" moments the version control system at $WORK runs all Perl code through perl -c before allowing the commit. (I am not recommending this as a general practice, just noting that it has been useful at our site.) But perl -c is unsafe, since it executes code (as it must). Using a conservative syntax checker instead would be safer, at the expense of some cases where the checker says 'maybe' but in fact the program is not valid Perl.

What I really want (end of digression): But in fact safety is not the motivating factor for my current application. I am interested in speed. Is there a way to roughly check and reject badly-formed Perl code before going to the expense of spinning up a whole Perl interpreter? PPI is slower than perl itself, so not a good candidate. You could write an approximate Perl grammar and use a parser generator to build a simple C program which accepts or rejects pseudo-Perl.

My application is 'delta debugging'. You start with a large program which has a certain property (such as segfaulting, for example), and knock out sections of it while preserving that property. I use http://delta.tigris.org/ which works in a simple-minded line-oriented way. Many of the test cases it generates will not be valid Perl code. The delta debugging would go faster if these could be eliminated quickly before the full perl executable is started.

Since the overhead of starting the perl interpreter is probably the biggest part of the time taken, you could implement some kind of server which listens on a socket, accepts program text, and returns 'bad' or 'maybe' by attempting to eval() the text or run it through PPI.

Another way to speed things up is to make perl fail faster. Normally it prints all the syntax errors and diagnostics it can find. If it would instead stop on the first one some time would be saved.

But I do like the idea of a grammar for almost-Perl which could be checked by a simple C program. Does such a thing exist?

(Related: Perl shallow syntax check? ie. do not check syntax of imports but my question is more about speed, and I am happy to have a rough check which will accept some invalid programs, as long as it does not reject valid ones.)

Upvotes: 3

Views: 948

Answers (2)

ysth
ysth

Reputation: 98378

If all you want is a quick compilability check, have a perl process that stays running to check each file for you:

perl -MFile::Slurp -lne'print 0+!! eval "sub {" . read_file($_) . "}"'

Upvotes: 2

tobyink
tobyink

Reputation: 13664

Given source filters, prototypes, and the Perl (5.14+) keyword API, imports can radically alter what syntax is valid and what is not. If you import anything, then such a check would be of very little use.

If you import nothing, then you can probably safely load all your external modules with require instead of use, and perl -c will become lightning fast (because require is processed at runtime).

PPI is not especially useful here because it takes a very forgiving best guess approach at parsing, so will accept very invalid inputs without complaints:

#!perl
use strict;
use warnings;
use PPI::Document;
use PPI::Dumper;

PPI::Dumper->new(
   PPI::Document->new(\"foo q[}")
)->print;

Perl::Lexer might be possibly more helpful, though it will only detect errors so broken that they can't even be tokenized. My previous example happens to be one of those, so this does complain:

#!perl
use strict;
use warnings;
use Perl::Lexer;

print $_->inspect, $/
   for @{ Perl::Lexer->new->scan_string("foo q[}") };

Even so, things like the Perl keyword API, Devel::Declare, and source filters are applied prior to lexing, so if you import any modules that take advantage of these techniques, Perl::Lexer will be stuck. (Any of these techniques can easily make foo q[} valid syntax.)

Compiler::Lexer and Compiler::Parser may be of some use. The following dumps core:

#!perl
use strict;
use warnings;
use Compiler::Lexer;
use Compiler::Parser;

my $t = Compiler::Lexer->new("eg.pl")->tokenize("foo q[}");
my $a = Compiler::Parser->new->parse($t);

While if you correct the mismatched quotes in foo q[} to foo q[], it no longer dumps core. That seems to be a result. ;-)

Ultimately, the answer depends on what sort of code you're writing and what class of errors you're hoping to spot. perl -c will give you a fairly rigorous syntax check. Perl::Lexer may be faster but there are big classes of errors it won't spot. Compiler::Lexer/Compiler::Parser might be useful in the future but seems to behave erratically right now.

Personally, I'd stick with perl -c, and if it's too slow try to cut down the number of modules you load at compile-time, in favour of run-time loading.

TL;DR: if you want static analysis, don't use Perl.

Upvotes: 3

Related Questions