Vasiliy Bohdanets
Vasiliy Bohdanets

Reputation: 109

Java regex for deleting all comments programmatically

I have some text file with code.

 /*Comment here*/

 public void start(Stage primaryStage) throws Exception{
    Parent root = FXMLLoader.load(getClass().getResource("sample.fxml"));
    primaryStage.setTitle("First");
/*Comment here
*and
*here*/
    primaryStage.setScene(new Scene(root, 640, 480));
    primaryStage.show();//Comment this
//and comment that
}

And make it looks like that:

 public void start(Stage primaryStage) throws Exception{
    Parent root = FXMLLoader.load(getClass().getResource("sample.fxml"));
    primaryStage.setTitle("First");
    primaryStage.setScene(new Scene(root, 640, 480));
    primaryStage.show();
}

i've tried this:

 public String delComments(String content){
    Pattern regex = Pattern.compile("/\\*.*?\\*/|/{2,}[^\\n]*", Pattern.MULTILINE);
    Matcher matcher = regex.matcher(content);
    String clean = content.replaceAll("(?s:/\\*.*?\\*/)|//.*", "");
    return clean;
}

Method that read file and replace it all

public void delCommentAction(ActionEvent actionEvent) throws IOException {
    String line = null;
    FileReader fileReader =
            new FileReader(filePath);
    BufferedReader bufferedReader =
            new BufferedReader(fileReader);
    FileWriter fw = new FileWriter(filePathNoComm);
    BufferedWriter bw = new BufferedWriter(fw);
    while((line = bufferedReader.readLine()) != null) {
        bw.write(delComments(line));
    }
    bw.close();
}

But it doesn't work(comments weren't deleted)

Upvotes: 1

Views: 1353

Answers (1)

Andreas
Andreas

Reputation: 159086

As suggested in a comment, you should use a full parser, because the Java language is too complex for a regex to do this accurately.

However, if you are ok with a few caveats, it can be done with the following regex:

(?s:/\*.*?\*/)|//.*

See regex101 for demo.

In Java code, that would be:

String clean = original.replaceAll("(?s:/\\*.*?\\*/)|//.*", "");

Caveat: It doesn't recognize string literals, and /* or // inside a string literal does not start a Java comment. This regex will however think it is one and remove content from string literals (and beyond).


Unrolled version is:

String clean = original.replaceAll("/\\*[^*]*(?:\\*(?!/)[^*]*)*\\*/|//.*", "");

No noticeable difference on the given text. If the 3-line comment is made 3000 characters long, the unrolled version is somewhat faster, but not enough to notice unless you're doing 10000+ replacements, so I'd consider this premature optimization.

Upvotes: 2

Related Questions