Nate Glenn
Nate Glenn

Reputation: 6744

remove end-of-line comments unless quoted

I need a Perl regex to remove end-of-line comments. I feel like I've Googled around for this and couldn't find just the right thing. Here are the details:

EOL comment is indicated using a pound sign (#)

Anything can be quoted using vertical bars (|)

So the following has a comment:

foo bar #baz

But the following doesn't:

foo |quoted###with bars|

The following has a comment and a quote that contains the comment character:

foo |quoted###with bars| #comment here

The first thing I tried was s/#(?=[^|]*$).*$//, which unfortunately removes quoted pounds. The next thing that doesn't work is /#(?=[^|]*$).*$//, because it fails on multiline quotes, like the following:

foo |quote begins here ##still going
        ##and it's still going| #this is a quote, though.

I feel like I may be able to glean something from the regex for C/C++ comments in perlfaq6, but it's too complicated for me grab just the stuff I need (don't need multiline comments;).

Can anyone provide a regex which removes EOL comments but ignores quoted comment characters?

Upvotes: 2

Views: 358

Answers (2)

Kalju Pärn
Kalju Pärn

Reputation: 31

  private static int find(String s, String t, int start) {
    int ret = s.indexOf(t, start);
    return ret < 0 ? Integer.MAX_VALUE : ret;
}

private static String removeLineCommnt(String s) {
    int i, start = 0;
    while (0 <= (i = find(s,"//", start))){ //Speed it up
        int j = find(s,"'", start);
        int k = find(s,"\"", start);
        int first = min(i, min( j, k));
        if (first == Integer.MAX_VALUE) return s;
        if (i == first) return s.substring(0, i);
        //skipp quoted string
        start++;
        if (j == first) { // ' asdasasd '
            for (int p = s.indexOf("\\'", start); p >= 0; p = s.indexOf("\\'", start)) {
                start += 2;
            }
            start = s.indexOf("'",start);
            if (start < 0) return s;
            start++;
            continue;
        }
        // " asdasdasd "
        for (int p = s.indexOf("\\\"", start); p >= 0; p = s.indexOf("\\\"", start)) {
            start += 2;
        }
        start = s.indexOf("\"", start);
        if (start < 0) return s;
        start++;
    };
    return s;
}


private static String removeLineCommnts(String s) {
    if (!s.contains("//")) return s; //Speed it up

    return Arrays.stream(s.split("[\\n\\r]+")).
            map(CommonTest::removeLineCommnt).
            collect(Collectors.joining("\n"));
}

Upvotes: -1

ruakh
ruakh

Reputation: 183456

One approach:

s/(\|[^|]*\|)|#.*/$1||''/eg

This replaces |...| (including |...#...|) with itself, and #... with nothing.

Upvotes: 2

Related Questions