Reputation: 13666
Maybe the question seems to be stupid, but I have to handle with several Gbs of text files to be preprocessed.
Is there any efficient and possibly elegant way in Java to remove from a String
all the substrings that are between two String
s used as delimiter? E.g. when you define two delimiters, say ([
and ])
, then from the String
"Hi ([bla bla]) how are ([test]) you?" a new String "Hi how are you?" must be returned.
The simplest way that I found is the following:
String text = "Hi ([bla bla]) how are ([test]) you?";
while(text.contains("([") && text.contains("])")){
text = text.substring(0, text.indexOf("(["))+
text.substring(text.indexOf("])")+"]))".length());
}
System.out.println(text); //Prints "Hi how are you?"
where ([
and ])
are the delimiters.
External library globally used (e.g. Apache libraries) are also welcome, but the standard Java API is preferred.
Upvotes: 0
Views: 1428
Reputation: 1722
A regular expression is the easier way, but probably on big files, the more efficient way in Java is going with a binary search, that is reading byte-per-byte with a RandomAccessFile - http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html.
Upvotes: 0
Reputation: 129537
As long as there is no nesting involved, you can use regular expressions:
text = text.replaceAll("\\(\\[.*?\\]\\)", "");
If you want to deal with spaces:
text = text.replaceAll("\\s*\\(\\[.*?\\]\\)\\s*", " ");
Upvotes: 3