Jong Lee
Jong Lee

Reputation: 33

Regex to replace comments with number of new lines

I want to replace all Java-style comments (/* */) with the number of new lines for that comment. So far, I can only come up with something that replaces comments with an empty string

String.replaceAll("/\\*[\\s\\S]*?\\*/", "")

Is it possible to replace the matching regexes instead with the number of new lines it contains? If this is not possible with just regex matching, what's the best way for it to be done?

For example,

/* This comment
has 2 new lines
contained within */

will be replaced with a string of just 2 new lines.

Upvotes: 2

Views: 313

Answers (3)

user557597
user557597

Reputation:

Since Java supports the \G construct, just do it all in one go.
Use a global regex replace function.

Find

"/(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"

Replace

"$1"

https://regex101.com/r/l1VraO/1

Expanded

 (?:
      / \* 
      (?= [\S\s]*? \* / )
   |  
      (?<! \* / )
      (?! ^ )
      \G 
 )
 (?:
      (?! \r? \n | \* / )
      . 
 )*
 (                             # (1 start)
      (?: \r? \n )?
 )                             # (1 end)
 (?: \* / )?

==================================================
==================================================

IF you should ever care about comment block delimiters started within
quoted strings like this

String comment = "/* this is a comment*/"

Here is a regex (addition) that parses the quoted string as well as the comment.
Still done in a single regex all at once in a global find / replace.

Find

"/(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\")|(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\")(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"

Replace

"$1$2"

https://regex101.com/r/tUwuAI/1

Expanded

    (                             # (1 start)
         "
         [^"\\]* 
         (?:
              \\ [\S\s] 
              [^"\\]* 

         )*
         "
    )                             # (1 end)
 |  
    (?:
         / \* 
         (?= [\S\s]*? \* / )
      |  
         (?<! " )
         (?<! \* / )
         (?! ^ )
         \G 
    )
    (?:
         (?! \r? \n | \* / )
         . 
    )*
    (                             # (2 start)
         (?: \r? \n )?
    )                             # (2 end)
    (?: \* / )?

Upvotes: 1

Andreas
Andreas

Reputation: 159086

You can do it with a regex "replacement loop".

Most easily done in Java 9+:

String result = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input)
                       .replaceAll(r -> r.group().replaceAll(".*", ""));

The main regex has been optimized for performance. The lambda has not been optimized.

For all Java versions:

Matcher m = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
StringBuffer buf = new StringBuffer();
while (m.find())
    m.appendReplacement(buf, m.group().replaceAll(".*", ""));
String result = m.appendTail(buf).toString();

Test

final String input = "Line 1\n"
                   + "/* Inline comment */\n"
                   + "Line 3\n"
                   + "/* One-line\n"
                   + "   comment */\n"
                   + "Line 6\n"
                   + "/* This\n"
                   + "   comment\n"
                   + "   has\n"
                   + "   4\n"
                   + "   lines */\n"
                   + "Line 12";

Matcher m = Pattern.compile("(?s)/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
String result = m.replaceAll(r -> r.group().replaceAll(".*", ""));

// Show input/result side-by-side
String[] inLines = input.split("\n", -1);
String[] resLines = result.split("\n", -1);
int lineCount = Math.max(inLines.length, resLines.length);
System.out.println("input                    |result");
System.out.println("-------------------------+-------------------------");
for (int i = 0; i < lineCount; i++) {
    System.out.printf("%-25s|%s%n", (i < inLines.length ? inLines[i] : ""),
                                    (i < resLines.length ? resLines[i] : ""));
}

Output

input                    |result
-------------------------+-------------------------
Line 1                   |Line 1
/* Inline comment */     |
Line 3                   |Line 3
/* One-line              |
   comment */            |
Line 6                   |Line 6
/* This                  |
   comment               |
   has                   |
   4                     |
   lines */              |
Line 12                  |Line 12

Upvotes: 1

Emma
Emma

Reputation: 27723

Maybe, this expression,

\/\*.*?\*\/

on s mode might be close to what you have in mind.

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class re{

    public static void main(String[] args){

        final String regex = "\\/\\*.*?\\*\\/";
        final String string = "/* This comment\n"
             + "has 2 new lines\n"
             + "contained within */\n\n"
             + "Some codes here 1\n\n"
             + "/* This comment\n"
             + "has 2 new lines\n"
             + "contained within \n"
             + "*/\n\n\n"
             + "Some codes here 2";
        final String subst = "\n\n";

        final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
        final Matcher matcher = pattern.matcher(string);

        final String result = matcher.replaceAll(subst);

        System.out.println(result);

    }
}

Output

Some codes here 1






Some codes here 2

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 0

Related Questions