Rickard
Rickard

Reputation: 1299

Removing a substring between two characters (java)

I have a java string such as this:

String string = "I <strong>really</strong> want to get rid of the strong-tags!";

And I want to remove the tags. I have some other strings where the tags are way longer, so I'd like to find a way to remove everything between "<>" characters, including those characters.

One way would be to use the built-in string method that compares the string to a regEx, but I have no idea how to write those.

Upvotes: 6

Views: 17479

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You should use

String stripped = html.replaceAll("<[^>]*>", "");
String stripped = html.replaceAll("<[^<>]*>", "");

where <[^>]*> matches substrings starting with <, then zero or more chars other than > (or the chars other than < and > if you choose the second version) and then a > char.

Note that <.*?>

See the regex demo.

Upvotes: 0

Gibolt
Gibolt

Reputation: 47097

To avoid Regex:

String toRemove = StringUtils.substringBetween(string, "<", ">");
String result = StringUtils.remove(string, "<" + toRemove + ">"); 

For multiple instances:

String[] allToRemove = StringUtils.substringsBetween(string, "<", ">");
String result = string;
for (String toRemove : allToRemove) {
  result = StringUtils.remove(result, "<" + toRemove + ">"); 
}

Apache StringUtils functions are null-, empty-, and no match- safe

Upvotes: 4

Bohemian
Bohemian

Reputation: 424983

Caution is advised when using regex to parse HTML (due its allowable complexity), however for "simple" HTML, and simple text (text without literal < or > in it) this will work:

String stripped = html.replaceAll("<.*?>", "");

Upvotes: 22

Related Questions