Reputation: 1299
I have a java string such as this:
String string = "I <strong>really</strong> want to get rid of the strong-tags!";
And I want to remove the tags. I have some other strings where the tags are way longer, so I'd like to find a way to remove everything between "<>" characters, including those characters.
One way would be to use the built-in string method that compares the string to a regEx, but I have no idea how to write those.
Upvotes: 6
Views: 17479
Reputation: 626738
You should use
String stripped = html.replaceAll("<[^>]*>", "");
String stripped = html.replaceAll("<[^<>]*>", "");
where <[^>]*>
matches substrings starting with <
, then zero or more chars other than >
(or the chars other than <
and >
if you choose the second version) and then a >
char.
Note that <.*?>
(?s)<.*?>
, <(?s:.)*?>
, <[\w\W]*?>
, and many other not-so-efficient variations.See the regex demo.
Upvotes: 0
Reputation: 47097
To avoid Regex:
String toRemove = StringUtils.substringBetween(string, "<", ">");
String result = StringUtils.remove(string, "<" + toRemove + ">");
For multiple instances:
String[] allToRemove = StringUtils.substringsBetween(string, "<", ">");
String result = string;
for (String toRemove : allToRemove) {
result = StringUtils.remove(result, "<" + toRemove + ">");
}
Apache StringUtils functions are null-, empty-, and no match- safe
Upvotes: 4
Reputation: 424983
Caution is advised when using regex to parse HTML (due its allowable complexity), however for "simple" HTML, and simple text (text without literal <
or >
in it) this will work:
String stripped = html.replaceAll("<.*?>", "");
Upvotes: 22