Reputation: 67319
I have a C file which I copied from somewhere else, but it has a lot of comments like below:
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
How can I delete all the comments enclosed by /*
and */
. Sometimes, the comments
are consist of 4-5 lines, and i need to delete all those lines.
Basically, I need to delete all text between /*
and */
and even \n
can come in between. Please help me do this using one of sed
, awk
or perl
.
Upvotes: 14
Views: 10151
Reputation: 99
Try the below recursive way of finding and removing Java script type comments, XML type Comments and single line comments
/* This is a multi line js comments.
Please remove me*/
for f in find pages/ -name "*.*"
; do perl -i -wpe 'BEGIN{undef $/} s!/*.*?*/!!sg' $f; done
<!-- This is a multi line xml comments.
Please remove me -->
for f in find pages/ -name "*.*"
; do perl -i -wpe 'BEGIN{undef $/} s!<!--.*?-->!!sg' $f; done
//This is single line comment Please remove me.
for f in find pages/ -name "*.*"
; do sed -i 's///.*//' $f; done
Note : pages is a root directory and the above script will find and remove in all files located in root and sub directories as well.
Upvotes: 1
Reputation: 204731
You MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:
[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g;s/__/aB/g;s/#/aC/g' "$file" |
gcc -P -E $arg - |
sed 's/aC/#/g;s/aB/__/g;s/aA/a/g'
Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.
Upvotes: 4
Reputation: 146
When I want something short and simple for CSS, I use this:
awk -vRS='*/' '{gsub(/\/\*.*/,"")}1' FILE
This won't handle the case where comment delimiters appear inside strings but it's much simpler than a solution that does. Obviously it's not bulletproof or suitable for everything but you know better than the pedants on SO whether or not you can live with that.
I believe this one is bulletproof however.
Upvotes: 1
Reputation: 272437
See perlfaq6. It's quite a complex scenario.
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;
A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).
Upvotes: 12
Reputation: 118166
Please do not use cpp
for this unless you understand the ramifications:
$ cat t.c
#include <stdio.h>
#define MSG "Hello World"
int main(void) {
/* ANNOY: print MSG using the puts function */
puts(MSG);
return 0;
}
Now, let's run it through cpp
:
$ cpp -P t.c -fpreprocessed
#include <stdio.h>
int main(void) {
puts(MSG);
return 0;
}
Clearly, this file is no longer going to compile.
Upvotes: 5
Reputation: 12047
Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?
[Edit] This approach also handles Barts printf(".../*...")
scenario cleanly
Example:
[File: t.c]
/* This is a comment */
int main () {
/*
* This
* is
* a
* multiline
* comment
*/
int f = 42;
/*
* More comments
*/
return 0;
}
.
$ cpp -P t.c
int main () {
int f = 42;
return 0;
}
Or you can remove the whitespace and condense everything
$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
int f = 42;
return 0;
}
No use re-inventing the wheel, is there?
[Edit]
If you want to not expand included files and macroa by this approach, cpp
provides flags for this. Consider:
[File: t.c]
#include <stdio.h>
int main () {
int f = 42;
printf(" /* ");
printf(" */ ");
return 0;
}
.
$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
int f = 42;
printf(" /* ");
printf(" */ ");
return 0;
}
There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.
Upvotes: 32
Reputation: 170308
Consider:
printf("... /* ...");
int matrix[20];
printf("... */ ...");
In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.
Upvotes: 4
Reputation: 118166
Take a look at the strip_comments
routine in Inline::Filters:
sub strip_comments {
my ($txt, $opn, $cls, @quotes) = @_;
my $i = -1;
while (++$i < length $txt) {
my $closer;
if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
@quotes) {
$i = skip_quoted($txt, $i, $closer);
next;
}
if (substr($txt, $i, length($opn)) eq $opn) {
my $e = index($txt, $cls, $i) + length($cls);
substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
$i--;
next;
}
}
return $txt;
}
Upvotes: 6
Reputation: 4873
Try this on the command line (replacing 'file-names' with the list of file that need to be processed):
perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names
This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.
Explanation:
perl -- call the perl interpreter
-i switch to 'change-in-place' mode.
-w print warnings to STDOUT (if there are any)
p read the files and print $_ for each record; like while(<>){ ...; print $_;}
e process the following argument as a program (once for each input record)
BEGIN{undef $/} --- process whole files instead of individual lines.
s! search and replace ...
/\* the starting /* marker
.*? followed by any text (not gredy search)
\*/ followed by the */ marker
!! replace by the empty string (i.e. remove comments)
s treat newline characters \n like normal characters (remove multi-line comments)
g repeat as necessary to process all comments.
file-names list of files to be processed.
Upvotes: 3
Reputation: 343211
very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)
$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
/*
function(){
blah blah
}
*/
float a;
float b;
$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
for (index = 0; index < 5 ;index++)
float a;
float b;
Upvotes: 0