Reputation: 41
I tried using regular expression to filter the single and multi-line comments from my text file. I am able to filter all the comments like
//it works
/*
* welcome
*/
/* hello*/
but I am not able to remove the following comment
/*
sample
*/
This is my code:
import java.io.*;
import java.lang.*;
class TestProg
{
public static void main(String[] args) throws IOException {
removeComment();
}
static void removeComment() throws IOException
{
try {
BufferedReader br = new BufferedReader(new FileReader("d:\\data.txt"));
String line;
while((line = br.readLine()) != null){
if(line.contains("/*") && line.contains("*/") || line.contains("//")) {
System.out.println(line.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)",""));
}
else if(line.contains("/*") || line.contains("*") || line.contains("*/")) {
continue;
}
else
System.out.println(line);
}
br.close();
}
catch(IOException e) {
System.out.println("OOPS! File could not read!");
}
}
}
Please help me to solve this...
Thanks in advance.
Upvotes: 4
Views: 5704
Reputation: 88707
Since you read each line individually you can't apply a single regex to it. Instead you'd have to look for single line comments ( //.*
) as well as start and end of multiline comments (/\*.*
and .*\*/
). If you find a multiline comment start then keep account of that and handle everything as a comment until you encounter the end match.
Example:
boolean inComment = false;
while((line = br.readLine()) != null){
//single line comment, remove everything after the first //
if( line.contains("//") ) {
System.out.println(line.replaceAll("//.*",""));
}
//start of multiline, remove everthing after the first /*
else if( line.contains("/*") ) {
System.out.println(line.replaceAll("/\*.*",""));
inComment = true;
}
//end of multiline, remove everthing until the first */
else if( line.contains("*/") {
//note the reluctant quantifier *? which is necessary to match as little as possible
//(otherwise .* would match */ as well)
System.out.println(line.replaceFirst(".*?\*/",""));
inComment = true;
}
//inside a multiline comment, ignore the entire line
else if( inComment ) {
continue;
}
Edit: an important addition
In your question you're talking about text files which normally have a regular structure and thus you can apply my answer.
But, as you stated in the title, if the files contain Java code then you have a irregular problem domain, i.e. Java code. In that case you can't safely apply regex and should better use a Java parser.
For more information have a look here: RegEx match open tags except XHTML self-contained tags Although this is about applying regex to HTML the same is true for applying regex on Java since both are irregular problem domains.
Upvotes: -1
Reputation: 22973
Using the javaparser you could solve it like shown in this PoC.
RemoveAllComments
import japa.parser.JavaParser;
import japa.parser.ParseException;
import japa.parser.ast.CompilationUnit;
import japa.parser.ast.Node;
import java.io.File;
import java.io.IOException;
public class RemoveAllComments {
static void removeComments(Node node) {
for (Node child : node.getChildrenNodes()) {
child.setComment(null);
removeComments(child);
}
}
public static void main(String[] args) throws ParseException, IOException {
File sourceFile = new File("Test.java");
CompilationUnit cu = JavaParser.parse(sourceFile);
removeComments(cu);
System.out.println(cu.toString());
}
}
TestClass.java used as an example input source
/**
* javadoc comment
*/
class TestClass {
/*
* block comment
*/
static class Cafebabe {
}
// line comment
static interface Commentable {
}
public static void main(String[] args) {
}
}
output to stdout (to store it in a file is up to you)
class TestClass {
static class Cafebabe {
}
static interface Commentable {
}
public static void main(String[] args) {
}
}
Upvotes: 3
Reputation: 2554
Try out the following code:
// Read the entire file into a string
BufferedReader br = new BufferedReader(new FileReader("filename"));
StringBuilder builder = new StringBuilder();
int c;
while((c = br.read()) != -1){
builder.append((char) c);
}
String fileData = builder.toString();
// Remove comments
String fileWithoutComments = fileData.replaceAll("([\\t ]*\\/\\*(?:.|\\R)*?\\*\\/[\\t ]*\\R?)|(\\/\\/.*)", "");
System.out.println(fileWithoutComments);
It first reads the entire file into a string and then removes all comments from it. The explaination of the regex could be found here: https://regex101.com/r/vK6lC4/3
Upvotes: -1
Reputation: 1859
Try this code
import java.io.*;
import java.lang.*;
class Test {
public static void main(String[] args) throws IOException {
removeComment();
}
static void removeComment() throws IOException {
try {
BufferedReader br = new BufferedReader(new FileReader("d:\\fmt.txt"));
String line;
boolean comment = false;
while ((line = br.readLine()) != null) {
if (line.contains("/*")) {
comment = true;
continue;
}
if(line.contains("*/")){
comment = false;
continue;
}
if(line.contains("//")){
continue;
}
if(!comment){
System.out.println(line);
}
}
br.close();
}
catch (IOException e) {
System.out.println("OOPS! File could not read!");
}
}
}
I have given below code as input :
package test;
public class ClassA extends SuperClass {
/**
*
*/
public void setter(){
super.set(10);
}
/* public void printer(){
super.print();
}
*/
public static void main(String[] args) {
// System.out.println("hi");
}
}
My output is :
package test;
public class ClassA extends SuperClass {
public void setter(){
super.set(10);
}
public static void main(String[] args) {
}
}
Upvotes: 0