Ragesh ck
Ragesh ck

Reputation: 51

How to remove all comments from string without affecting URL in java

I need to remove all types of comments from my string without affecting the URL defined in that string. When i tried removing comments from string using regular expression some part of the URL also removed from the string. I tried the following regex but the same issue happening.

    String sourceCode= "/*\n"
                + " * Multi-line comment\n"
                + " * Creates a new Object.\n"
                + " */\n"
                + "public Object someFunction() {\n"
                + " // single line comment\n"
                + " Object obj =  new Object();\n"
                + " return obj; /* single-line comment */\n"
                + "}"
                + "\n"
                + "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";

    sourceCode=sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", "");
    System.out.println(sourceCode);

but anyway the comments are removed but the out put is showing like this

    public Object someFunction() {
        Object obj =  new Object();
        return obj; 
    }
    https:

please help me to find out a solution for this.

Upvotes: 2

Views: 1528

Answers (3)

invenit
invenit

Reputation: 424

[^:]//.*|/\\*((.|\\n)(?!=*/))+\\*/ Changes are in first few characters - [^:]. This means that symbol before // must be not :.

I usually use regex101.com to work with regular expressions. Select python language for your case (since languages use a little bit different escaping).

This is quite complex regexp to be read by human, so another solultion may be in using several simple expressions and process incoming text in multiple passes. Like

  1. Remove one-line comments
  2. Remove multiline comments
  3. Process some special cases

Note: Processing regexp costs pretty much time. So if performance is required, you should check for another solution - your own processor or third-party libraries.

EDITED As suggested @Wiktor expression [^:]//.*|/\\*((?!=*/)(?s:.))+\\*/ is faster solution. At least 2-3 times faster.

Upvotes: 1

Sunil Kanzar
Sunil Kanzar

Reputation: 1260

For more specific this EXP should be use

.*[^:]//.*|/\\*((.|\\n)(?!=*/))*\\*/

Your provided pattern was not able to remove /**/ portion of code if it is there.(If it is special requirement then its fine)

So Your EXP is like :
enter image description here

And it should be:
enter image description here

For more understanding visit and use your EXP .*[^:]\/\/.*|\/\*((.|\n)(?!=*\/))*\*\/ it will show you graph for that.

Upvotes: 0

ahmetcetin
ahmetcetin

Reputation: 2990

You can split your String by "\n" and check each line. Here is the tested code:

String sourceCode= "/*\n"
            + " * Multi-line comment\n"
            + " * Creates a new Object.\n"
            + " */\n"
            + "public Object someFunction() {\n"
            + " // single line comment\n"
            + " Object obj =  new Object();\n"
            + " return obj; /* single-line comment */\n"
            + "}"
            + "\n"
            + "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";

String [] parts = sourceCode.split("\n");

System.out.println(getUrlFromText(parts));

Here is the fetching method:

private static String getUrlFromText(String []parts) {
    for (String part : parts) {
        if(part.startsWith("http")) {
            return part;
        }
    }

    return null;
}

Upvotes: 0

Related Questions