Koerr
Koerr

Reputation: 15723

How to keep the HTML tags specified

I am using this pattern to remove all HTML tags (Java code):

String html="text <a href=#>link</a> <b>b</b> pic<img src=#>";
html=html.replaceAll("\\<.*?\\>", "");

System.out.println(html);

Now, I want to keep tag <a ...> (with </a>) and tag <img ...>

I want the result to be:

text <a href=#>link</a> b pic<img src=#>

How to do this?


I don't need HTML parser to do this,

because I need this regex pattern to filter a lot of html fragment,

so,I want the solution with regex

Upvotes: 1

Views: 875

Answers (5)

Huy - Logarit
Huy - Logarit

Reputation: 686

I recommend you use strip_tags (a PHP function)

string strip_tags ( string $str [, string $allowable_tags ] )

    <?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

OUTPUT

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

Upvotes: -1

Mark Byers
Mark Byers

Reputation: 838096

You could do this using a negative lookahead:

"<(?!(?:a|/a|img)\\b).*?>"

Rubular

However this has a number of problems and I would recommend instead that you use an HTML parser if you want a robust solution.

For more information see this question:

Upvotes: 3

Daniel Cassidy
Daniel Cassidy

Reputation: 25607

Use a proper HTML parser, for example htmlparser, Jericho or the validator.nu HTML parser. Then use the parser’s API, SAX or DOM to pull out the stuff you’re interested in.

If you insist on using regular expressions, you’re almost certain to make some small mistake that will lead to breakage, and possibly to cross-site scripting attacks, depending on what you’re doing with the markup.

See also this answer.

Upvotes: 0

Kerem Baydoğan
Kerem Baydoğan

Reputation: 10722

Hey! Here is your answer:

You can’t parse [X]HTML with regex.

Upvotes: 0

Gadolin
Gadolin

Reputation: 2686

Check this out http://sourceforge.net/projects/regexcreator/ . This is very handy gui regex editor.

Upvotes: 0

Related Questions