Akari
Akari

Reputation: 856

How to use regular expressions to remove some html tags from string in java

I wrote a code to read news from XML file (Feed) .. and I have to display the description of each item in my list view ... and I used this peas of code to remove the html tags exists inside the description tag :

else if ("description".equals(tagName)){
                             sourcedescription= parser.nextText();
                             description=Html.fromHtml(sourcedescription).toString();
                             Log.d("msg", description);
                             feedDescription.add(description);

                         }

some items I succeeded to display its description without tags i.e. in an understood manner , BUT I failed to remove all tags for some other items which have {iframe} {/iframe} tag ... and I think this tag exists in the description tags of the items which have "no description"

<description><![CDATA[<p>{iframe height="600"}<a href="http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438">http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438</a><span style="line-height: 1.3em;">{/iframe}</span></p>]]></description>

My question is how to remove the iframe tag by using regular expressions ?

Upvotes: 0

Views: 1902

Answers (4)

hwnd
hwnd

Reputation: 70722

Note: Use a parser if you have the option. That said...for a quick and dirty..

str.replaceAll("\\{/?iframe.*?\\}", "");

To remove the content between these tags.

str.replaceAll("\\{iframe.*?\\}.*?\\{/iframe\\}", "")

Upvotes: 1

moliware
moliware

Reputation: 10278

A posible solution would be

    String regexp = "\\{/?iframe.*?\\}";
    String text = "<description><![CDATA[<p>{iframe height=\"600\"}<a href=\"http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438\">http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438</a><span style=\"line-height: 1.3em;\">{/iframe}</span></p>]]></description>";
    System.out.println(text.replaceAll(regexp, ""));

If you want to remove the content inside the tag iframe, use this regexp instead:

text.replaceAll("\\{iframe .*?\\}.*?\\{/iframe\\}", "")

Upvotes: 2

Display Name
Display Name

Reputation: 8128

HTML is not a regular language. Don't use RegEx with it, or you'll die.

Upvotes: 0

fabien
fabien

Reputation: 1549

Use these regex:

\{iframe[^\}]*\}   // to delete the opening tag
\{/iframe[^\}]*\}  // to delete the closing tag

These regex won't delete what is in the iframe.

Upvotes: 2

Related Questions