Reputation: 21

How can I strip all comments from a Perl script except for the shebang line?

I have a Perl script that strips comments from other Perl scripts:

open (INFILE, $file);
@data = <INFILE>;

foreach $data (@data)
{
    $data =~ s/#.*/ /g;
    print "$data";
}

The problem is, this code also removes the shebang line:

#!/usr/bin/perl

How can I strip comments except for the shebang?

Upvotes: 1

Answers (4)

Håkon Hægland

Reputation: 40778

There is a method PPR::decomment() that can be used:

use strict;
use warnings;
use PPR;

my $document = <<'EOF';
print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n'; # The comment
return $function && $function !~ /^[\s{}#]/;
EOF

my $res = PPR::decomment( $document );
print $res;

Output:

print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n'; 
return $function && $function !~ /^[\s{}#]/;

Upvotes: 4

ikegami

Reputation: 386331

Since you asked for a regex solution:

'' =~ /(?{
   system("perltidy", "--delete-block-comments", "--delete-side-comments", $file);
   die "Can't launch perltidy: $!\n"                   if $? == -1;
   die "perltidy killed by signal ".( $? & 0x7F )."\n" if $? & 0x7F;
   die "perltidy exited with error ".( $? >> 8 )."\n"  if $? >> 8;
});

It seems like you are leaning towards using the following:

#!/usr/bin/perl
while (<>) {
   if ($. != 1) {
      s/#.*//;
   }
   print;
}

But it doesn't work on itself:

$ chmod u+x stripper.pl

$ stripper.pl stripper.pl >stripped_stripper.pl

$ chmod u+x stripped_stripper.pl

$ stripped_stripper.pl stripper.pl
Substitution pattern not terminated at ./stripped_stripper.pl line 4.

$ cat stripped_stripper.pl
#!/usr/bin/perl
while (<>) {
   if ($. != 1) {
      s/
   }
   print;
}

It also fails to remove comments on the first line:

$ cat >first.pl
# This is my first Perl program!
print "Hello, World!\n";

$ stripper.pl first.pl
# This is my first Perl program!
print "Hello, World!\n";

Upvotes: 2

Miller

Reputation: 35208

perltidy is the method to do this if it's anything but an exercise. There's also PPI for parsing perl. Could use the PPI::Token::Comment token to do something more complicated than just stripping.

However, to answer your direct question, don't try to solve everything in a single regex. Instead, break up your problems into logic pieces of information and logic. In this instead, if you want to skip the first line, do so by using line by line processing which conveniently sets the current line number in $.

use strict;
use warnings;
use autodie;

my $file = '... your file...';

open my $fh, '<', $file;

while (<$fh>) {
    if ($. != 1) {
        s/#.*//;
    }

    print;
}

Disclaimer

The approach of using regex's for this problem is definitely flawed as everyone has already said. However, I'm going to give your instructor the benefit of the doubt, and that she/he is aiming to teach by intentionally giving you a problem that is outside of the perview of regex's ability. Good look finding all of those edge cases and figuring out how to do with them.

Whatever you do, don't try to solve them using a single regex. Break your problem up and use lots of if's and elsif's

Upvotes: 3

ThisSuitIsBlackNot

Reputation: 24073

Writing code to strip comments is not trivial, since the # character can be used in other contexts than just comments. Use perltidy instead:

perltidy --delete-block-comments --delete-side-comments foo

will strip # comments (but not POD) from file foo and write the output to foo.tdy. The shebang is not stripped.

Upvotes: 14

How can I strip all comments from a Perl script except for the shebang line?

Answers (4)

Related Questions