Reputation: 21
I have a Perl script that strips comments from other Perl scripts:
open (INFILE, $file);
@data = <INFILE>;
foreach $data (@data)
{
$data =~ s/#.*/ /g;
print "$data";
}
The problem is, this code also removes the shebang line:
#!/usr/bin/perl
How can I strip comments except for the shebang?
Upvotes: 1
Views: 2013
Reputation: 40778
There is a method PPR::decomment()
that can be used:
use strict;
use warnings;
use PPR;
my $document = <<'EOF';
print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n'; # The comment
return $function && $function !~ /^[\s{}#]/;
EOF
my $res = PPR::decomment( $document );
print $res;
Output:
print "\n###################################\n";
print '\n###################################\n';
print '\nFollowed by comment \n';
return $function && $function !~ /^[\s{}#]/;
Upvotes: 4
Reputation: 386331
Since you asked for a regex solution:
'' =~ /(?{
system("perltidy", "--delete-block-comments", "--delete-side-comments", $file);
die "Can't launch perltidy: $!\n" if $? == -1;
die "perltidy killed by signal ".( $? & 0x7F )."\n" if $? & 0x7F;
die "perltidy exited with error ".( $? >> 8 )."\n" if $? >> 8;
});
It seems like you are leaning towards using the following:
#!/usr/bin/perl
while (<>) {
if ($. != 1) {
s/#.*//;
}
print;
}
But it doesn't work on itself:
$ chmod u+x stripper.pl
$ stripper.pl stripper.pl >stripped_stripper.pl
$ chmod u+x stripped_stripper.pl
$ stripped_stripper.pl stripper.pl
Substitution pattern not terminated at ./stripped_stripper.pl line 4.
$ cat stripped_stripper.pl
#!/usr/bin/perl
while (<>) {
if ($. != 1) {
s/
}
print;
}
It also fails to remove comments on the first line:
$ cat >first.pl
# This is my first Perl program!
print "Hello, World!\n";
$ stripper.pl first.pl
# This is my first Perl program!
print "Hello, World!\n";
Upvotes: 2
Reputation: 35208
perltidy
is the method to do this if it's anything but an exercise. There's also PPI
for parsing perl. Could use the PPI::Token::Comment
token to do something more complicated than just stripping.
However, to answer your direct question, don't try to solve everything in a single regex. Instead, break up your problems into logic pieces of information and logic. In this instead, if you want to skip the first line, do so by using line by line processing which conveniently sets the current line number in $
.
use strict;
use warnings;
use autodie;
my $file = '... your file...';
open my $fh, '<', $file;
while (<$fh>) {
if ($. != 1) {
s/#.*//;
}
print;
}
Disclaimer
The approach of using regex's for this problem is definitely flawed as everyone has already said. However, I'm going to give your instructor the benefit of the doubt, and that she/he is aiming to teach by intentionally giving you a problem that is outside of the perview of regex's ability. Good look finding all of those edge cases and figuring out how to do with them.
Whatever you do, don't try to solve them using a single regex. Break your problem up and use lots of if
's and elsif
's
Upvotes: 3
Reputation: 24073
Writing code to strip comments is not trivial, since the #
character can be used in other contexts than just comments. Use perltidy
instead:
perltidy --delete-block-comments --delete-side-comments foo
will strip #
comments (but not POD) from file foo
and write the output to foo.tdy
. The shebang is not stripped.
Upvotes: 14