Reputation: 387
I'm running a code that read files, do some parsing, but need to ignore all comments. There are good explanations how to conduct it, like the answer to How can I strip multiline C comments from a file using Perl?
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;
My first problem is that after run this line $/ = undef;
my code doesn't work properly.
Actually, I don't know what it does. But if I could turn it back after ignoring all comments it will be helpful.
In general, What is the useful way to ignore all comments without changing the rest of the code?
Upvotes: 1
Views: 809
Reputation: 6252
If you are stripping "nested" comments, i.e.:
/* This is a comment
/* that has been re-commented */ possibly /* due to */
various modifications */
regexp may not be the best solution. Especially if this spans multiple lines as in the example above.
Last time I had to do something like this, I read the lines one at a time, keeping a count of how many levels of "/*" (or whatever the delimiter was for the specific language) and not printing anything unless the count was at 0.
Here is an example - I apologize in advance because it's pretty bad Perl, but this should give you an idea, at least:
use strict;
my $infile = $ARGV[0]; # File name
# Slurp up input file in an array
open (FH, "< $infile") or die "Opening: $infile";
my @INPUT_ARRAY = <FH>;
my @ARRAY;
my ($i,$j);
my $line;
# Removes all kind of comments (single-line, multi-line, nested).
# Further parsing will be carried on the stripped lines (in @ARRAY) but
# the error messaging routine will reference the original @INPUT_ARRAY
# so line fragments may contain comments.
my $commentLevel = 0;
for ($i=0; $i < @INPUT_ARRAY; $i++)
{
my @explodedLine = split(//,$INPUT_ARRAY[$i]);
my $resultLine ="";
for ($j=0; $j < @explodedLine; $j++)
{
if ($commentLevel > 0)
{
$resultLine .= " ";
}
if ($explodedLine[$j] eq "/" && $explodedLine[($j+1)] eq "*")
{
$commentLevel++;
next;
}
if ($explodedLine[$j] eq "*" && $explodedLine[($j+1)] eq "/")
{
$commentLevel--;
$j++;
next;
}
if (($commentLevel == 0) || ($explodedLine[$j] eq "\n"))
{
$resultLine .= $explodedLine[$j];
}
}
$ARRAY[$i]=join(" ",$resultLine);
}
close(FH) or die "Closing: $!";
Upvotes: 1
Reputation: 36412
You want to make $/
local, as in
$_ = do { local $/; <> };
or
{
local $/;
$_ = <>;
#...
}
Alternately, you could use File::Slurp
Upvotes: 1
Reputation: 342619
awk
$ cat file.c
one
two
three // comment at the back
// comment in front
four /* another comment */
/* comment spanning
multiple
lines
*/ five
six
seven
$ awk -vRS='*/' '{ gsub(/\/\*.*/,"");gsub("//.*","")}1' file.c
one
two
three
five
six
seven
the awk command sets the record separator RS
to */
, which is the ending tag for the multiline style comment. so it iterates the records, checking for /*
, the opening tag, and then get whatever is in front of /*
. this concept is simple, and you don't have to craft out complicated regex for this. Similar, if you were to do it with Python,
>>> data=open("file").read()
>>> for item in data.split("*/"):
... if "//" in item: item=item.split("//")[0]
... if "/*" in item: item=item.split("/*")[0]
... print item
...
one
two
three
five
six
seven
Upvotes: 2