Reputation: 81
I am currently stuck at generating a regex for the following requirement of the strings <b>abc<br/></b>
or xy<i>abcd<br/></i>
or <th>ab<br/></th>wvx
or etc.
My requirement is to remove <
and >
characters of <b>
or </b>
or <i>
or </i>
or <th>
or </th>
etc using java replaceAll(<regex>,"");
method without replacing the <
and >
characters of <br/>
tag.
Examples:
Input: <b>abc<br/></b>
Output should be: babc<br/>/b
Input: xy<i>abcd<br/></i>
Output should be: xyiabcd<br/>/i
Input: <th>ab<br/></th>wvx
Output should be: thab<br/>/thwvx
....... etc.
Please help me to resolve this.
Upvotes: 0
Views: 1558
Reputation: 521083
You may try using String#replaceAll
:
String input = "<b>abc<br/></b>";
input = input.replaceAll("</?(?!br)([^>]+)>", "$1");
System.out.println(input);
babc<br/>b
The pattern </?(?!br)[^>]+)>
will match any opening or closing HTML tag other than br
. It will replace that tag with just the text name of the tag.
Note that parsing HTML with regex generally is not a good idea. This may work in your case if you only have single level HTML as in your example strings.
Upvotes: 1
Reputation: 1196
</?([a-z]+)>
should do. If slash is after letters it will not match.
Upvotes: 1