Reputation: 3584
I have the task of (trying to) do a search and replace within a large codebase for a word suffix, only when it occurs within comments. All of the comments are of the /* or // type but they are guaranteed to include most of the edge cases imaginable.
So I want to change this:
/* blah blah something__suffix blah */
to this:
/* blah blah something blah */
but I also want to change this:
// blah blah something__suffix blah
to this:
// blah blah something blah
And this:
/*
* blah blah something__suffix blah
*/
to this:
/*
* blah blah something blah
*/
And this:
/**
// blah blah something__suffix blah
*/
To this:
/**
// blah blah something blah
*/
ad nauseam (literally).
Initially I felt that this was a parser task and I installed cochinelle, and indeed it could parse my comments but it got stuck with my preprocessor macros and the workarounds seemed complex for someone who just has this as a one-off task. So now I'm considering regex.
I haven't found a lot of advice around about doing really robust search and replace within C and C++ comments with regex (besides "you need a parser"), but I did notice that there seems to be a pretty well road-tested perl script on the perl FAQ for removing comments in both of these styles here.
as follows:
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
print;
My question: how to adapt this script so that instead of stripping the comment, the text that has been identified as a comment can then be searched for the suffix and the suffix removed, leaving the rest of the comment intact?
Upvotes: 2
Views: 330
Reputation: 386331
You need to do it in two steps because you might have
/* foo__suffix bar__suffix */
First, extract the comment, then substitute any __suffix
in the comment.
s{
\G
(?:(?!/[*/]).)*
\K
( /[*] (?:(?![*]/).)* [*]/
| // [^\n]*
)
}{
my $comment = $1;
$comment =~ s/(?<=\w)__suffix//g;
$comment
}xes;
Notes:
(?:(?!STRING).)
is to (?:STRING)
as [^CHAR]
is to CHAR
.
My solution will mess up if you have //
or /*
in a string literal.
If you're ok with removing instances of __suffix
that aren't preceded by an identifier, you can remove the (?<=\w)
.
If you're using 5.14 or higher, you can simplify
s{...}{
my $comment = $1;
$comment =~ s/(?<=\w)__suffix//g;
$comment
}xes;
to
s{...}{
$1 =~ s/(?<=\w)__suffix//rg
}xes;
Upvotes: 1
Reputation: 54373
I'm not sure if this is a good solution, but it works.
use strict; use warnings; use feature qw(say);
my @lines = (
qq~Example 1:
/* blah blah something__suffix blah */~,
qq~Example 2:
// blah blah something__suffix blah needs a newline at the end
~,
qq~Example 3:
/*
* blah blah something__suffix blah
*/~,
qq~Example 4:
/**
// blah blah something__suffix blah
*/~,
qq~Example 5 (string):
foobar '// blah blah something__suffix blah '~,
qq~Example 6:
public void main { return; } // this does__suffix nothing but needs newline
~,
);
foreach (@lines) {
print "Before:\n$_\n";
s!/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)!
{ if (defined $3) { $3 } else { (my $temp = ${^MATCH}) =~ s/__suffix//; $temp;} }
!gsepx;
print "After:\n$_\n\n";
}
It's probably not very efficient, but I don't think that is important for your job.
Upvotes: 1