Reputation: 6744
I need a Perl regex to remove end-of-line comments. I feel like I've Googled around for this and couldn't find just the right thing. Here are the details:
EOL comment is indicated using a pound sign (#)
Anything can be quoted using vertical bars (|)
So the following has a comment:
foo bar #baz
But the following doesn't:
foo |quoted###with bars|
The following has a comment and a quote that contains the comment character:
foo |quoted###with bars| #comment here
The first thing I tried was s/#(?=[^|]*$).*$//
, which unfortunately removes quoted pounds. The next thing that doesn't work is /#(?=[^|]*$).*$//
, because it fails on multiline quotes, like the following:
foo |quote begins here ##still going
##and it's still going| #this is a quote, though.
I feel like I may be able to glean something from the regex for C/C++ comments in perlfaq6, but it's too complicated for me grab just the stuff I need (don't need multiline comments;).
Can anyone provide a regex which removes EOL comments but ignores quoted comment characters?
Upvotes: 2
Views: 358
Reputation: 31
private static int find(String s, String t, int start) {
int ret = s.indexOf(t, start);
return ret < 0 ? Integer.MAX_VALUE : ret;
}
private static String removeLineCommnt(String s) {
int i, start = 0;
while (0 <= (i = find(s,"//", start))){ //Speed it up
int j = find(s,"'", start);
int k = find(s,"\"", start);
int first = min(i, min( j, k));
if (first == Integer.MAX_VALUE) return s;
if (i == first) return s.substring(0, i);
//skipp quoted string
start++;
if (j == first) { // ' asdasasd '
for (int p = s.indexOf("\\'", start); p >= 0; p = s.indexOf("\\'", start)) {
start += 2;
}
start = s.indexOf("'",start);
if (start < 0) return s;
start++;
continue;
}
// " asdasdasd "
for (int p = s.indexOf("\\\"", start); p >= 0; p = s.indexOf("\\\"", start)) {
start += 2;
}
start = s.indexOf("\"", start);
if (start < 0) return s;
start++;
};
return s;
}
private static String removeLineCommnts(String s) {
if (!s.contains("//")) return s; //Speed it up
return Arrays.stream(s.split("[\\n\\r]+")).
map(CommonTest::removeLineCommnt).
collect(Collectors.joining("\n"));
}
Upvotes: -1
Reputation: 183456
One approach:
s/(\|[^|]*\|)|#.*/$1||''/eg
This replaces |...|
(including |...#...|
) with itself, and #...
with nothing.
Upvotes: 2