What regex expression will operate with Java's "replaceAll" function to remove the
html tag and its contents from an html string?

Question

What regex expression will operate together with the Java replaceAll() method to remove the

html tag and its contents in between the tag from an HTML string?

For example, after applying the method,

"table test title
this is table cell value
miscellaneous contents
blah"

becomes:

"this is table cell value
blah"

Note: This is an "academic" exercise. I am not seeking a solution that uses an XML/HTML parser.

UPDATE:

Getting closer to a solution on this (thanks, jlordo!)... You pattern seems to work somewhat...

However, the suggested regex string ("<[pP]>.*?") does not appear to have an effect on a

tag that contains an attribute (i.e., in this case a "style" attribute) -- see below,

    public static void main(String[] args)
    {
        String htmlstring = "[click the submit button to create the new company.]
this is table cell value
miscellaneous contents
blah";
        htmlstring = htmlstring.replaceAll("<[pP]>.*?", "");
    }

htmlstring (before scrubbing):

[click the submit button to create the new company.]
this is table cell value
miscellaneous contents
blah

htmlstring (after scrubbing):

[click the submit button to create the new company.]
this is table cell value
blah

Is there anything we can do to "tweak" it so that it handles this issue?

Evgeniy Dorofeev · Accepted Answer

try

    htmlstring = htmlstring.replaceAll("(?i).*?", "");

note that (?i) means turn on case-insensitive flag

What regex expression will operate with Java's "replaceAll" function to remove the <p> html tag and its contents from an html string?

UPDATE:

Answers (2)

Related Questions

What regex expression will operate with Java&#39;s &quot;replaceAll&quot; function to remove the &lt;p&gt; html tag and its contents from an html string?

UPDATE:

Answers (2)

Related Questions

What regex expression will operate with Java's "replaceAll" function to remove the <p> html tag and its contents from an html string?