mat_boy
mat_boy

Reputation: 13666

Replace all the substrings within a couple of delimiters in Java

Maybe the question seems to be stupid, but I have to handle with several Gbs of text files to be preprocessed.

Is there any efficient and possibly elegant way in Java to remove from a String all the substrings that are between two Strings used as delimiter? E.g. when you define two delimiters, say ([ and ]), then from the String "Hi ([bla bla]) how are ([test]) you?" a new String "Hi how are you?" must be returned.

The simplest way that I found is the following:

String text = "Hi ([bla bla]) how are ([test]) you?";
while(text.contains("([") && text.contains("])")){
  text = text.substring(0, text.indexOf("(["))+
        text.substring(text.indexOf("])")+"]))".length());
}
System.out.println(text);  //Prints "Hi how are you?" 

where ([ and ]) are the delimiters.

External library globally used (e.g. Apache libraries) are also welcome, but the standard Java API is preferred.

Upvotes: 0

Views: 1428

Answers (3)

robermann
robermann

Reputation: 1722

A regular expression is the easier way, but probably on big files, the more efficient way in Java is going with a binary search, that is reading byte-per-byte with a RandomAccessFile - http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html.

Upvotes: 0

Tschallacka
Tschallacka

Reputation: 28732

Try replace all

input.replaceAll("\[[^\]]*\]", "");

Upvotes: 0

arshajii
arshajii

Reputation: 129537

As long as there is no nesting involved, you can use regular expressions:

text = text.replaceAll("\\(\\[.*?\\]\\)", "");

If you want to deal with spaces:

text = text.replaceAll("\\s*\\(\\[.*?\\]\\)\\s*", " ");

Upvotes: 3

Related Questions