How can I match and modify C and C++ comments with Perl?

Question

I have the task of (trying to) do a search and replace within a large codebase for a word suffix, only when it occurs within comments. All of the comments are of the /* or // type but they are guaranteed to include most of the edge cases imaginable.

So I want to change this:

/* blah blah something__suffix blah */

to this:

/* blah blah something blah */

but I also want to change this:

// blah blah something__suffix blah

to this:

// blah blah something blah

And this:

/*
 * blah blah something__suffix blah 
 */

to this:

/*
 * blah blah something blah 
 */

And this:

/** 

// blah blah something__suffix blah 

*/

To this:

/** 

// blah blah something blah 

*/

ad nauseam (literally).

Initially I felt that this was a parser task and I installed cochinelle, and indeed it could parse my comments but it got stuck with my preprocessor macros and the workarounds seemed complex for someone who just has this as a one-off task. So now I'm considering regex.

I haven't found a lot of advice around about doing really robust search and replace within C and C++ comments with regex (besides "you need a parser"), but I did notice that there seems to be a pretty well road-tested perl script on the perl FAQ for removing comments in both of these styles here.

as follows:

$/ = undef;
$_ = <>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\]|[^
][
]?)*?
|("(\.|[^"\])*"|'(\.|[^'\])*'|.[^/"'\]*)#defined $3 ? $3 : ""#gse;

print;

My question: how to adapt this script so that instead of stripping the comment, the text that has been identified as a comment can then be searched for the suffix and the suffix removed, leaving the rest of the comment intact?

ikegami · Accepted Answer

You need to do it in two steps because you might have

/* foo__suffix bar__suffix */

First, extract the comment, then substitute any __suffix in the comment.

s{
   \G
   (?:(?!/[*/]).)*
   \K
   (   /[*] (?:(?![*]/).)* [*]/
   |   //   [^
]*
   )
}{
   my $comment = $1;
   $comment =~ s/(?<=\w)__suffix//g;
   $comment
}xes;

Notes:

(?:(?!STRING).) is to (?:STRING) as [^CHAR] is to CHAR.
My solution will mess up if you have // or /* in a string literal.
If you're ok with removing instances of __suffix that aren't preceded by an identifier, you can remove the (?<=\w).

If you're using 5.14 or higher, you can simplify

s{...}{
   my $comment = $1;
   $comment =~ s/(?<=\w)__suffix//g;
   $comment
}xes;

to

s{...}{
   $1 =~ s/(?<=\w)__suffix//rg
}xes;

How can I match and modify C and C++ comments with Perl?

Answers (2)

Related Questions