Rishabh
Rishabh

Reputation: 199

Escaping html tags in html report

I have to write an HTML report from a java class which contains the source code of web pages. So the problem is that as soon as the source of a web page is encountered it is thought of by the browser as being the the end of html tags on the main report page and so the output is not renderd correctly. An example is shown below :

<html>
    <body>
        <li>
           <pre>
           <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
           <html>
           <head>
           <title>404 Not Found</title>
           </head><body>
           <h1>Not Found</h1>
           The page was not found on this server.
           </body>
           </html> 
           </pre>
        </li>
    </body>
</html>

I want that everything inside the pre tags must be taken as normal text and not html markup. I tried replacing < with &lt; , > with &gt; , & with &amp; etc.. but it doesnt seem to work. Any tips on how to make this possible?

EDIT : This is what i tried (a is the part inside pre tags)

File aFile = new File(filename);
try {
    BufferedWriter out = new BufferedWriter(new FileWriter(aFile,aFile.exists()));  
    a.replaceAll("<","&lt;");a.replaceAll(">","&gt;");a.replaceAll("\"","&;quot;");a.replaceAll("&","&amp;");
    out.write(a + "\r\n");    
    out.close();
} 

EDIT 2:

So this correct solution involved a=a.replaceAll(...), but another thing to note is that if i replace < with &gt and later on i replace & with &amp (like i do in the above example), It will againn mess my output(< will become &lt;). So the order must also be changed(replcae & first and then <).

Upvotes: 1

Views: 463

Answers (5)

Keppil
Keppil

Reputation: 46209

The sequence you post in the comment:

a.replaceAll("<","&lt;");
a.replaceAll(">","&gt;");
a.replaceAll("\"","&;quot;");
a‌​.replaceAll("&","&amp;"); 

won't work, since the replaceAll() method doesn't change the String it is called on. It can't, Strings are immutable in Java.
Also, as @Rishabh points out, your last replace call will mess up the previous replaces, so you need to change the order.

You need to do

a = a.replaceAll("&","&amp;");
a = ...

Or, just do them all without saving the intermediate result:

a = a.replaceAll("&","&amp;").replaceAll("<","&lt;").replaceAll(">","&gt;").replaceAll("\"","&;quot;"); 

Also, you should probably use the replace() method instead of replaceAll(), there is no need to use regexes in this case.

Upvotes: 1

Shrey
Shrey

Reputation: 2404

Well.. replaceAll may work.. However, I'll always prefer to use StingEscapeUtils as ..

a = StringEscapeUtils.escapeHtml4(a)

Upvotes: 1

Gumbo
Gumbo

Reputation: 655189

In Java, String objects are immutable. That means a.replaceAll doesn’t change a but returns a new String object in which the replacement took place.

So to fix this, you need to work with the returned object instead:

a = a.replaceAll("&","&amp;").replaceAll("<","&lt;");

And you actually only need to replace the & and < for your specific application.

Upvotes: 2

Ravi Trivedi
Ravi Trivedi

Reputation: 2360

Replace this line:

a.replaceAll("<","&lt;");a.replaceAll(">","&gt;");a.replaceAll("\"","&;quot;");a.replaceAll("&","&amp;");

As this:

a = a.replaceAll("<","&lt;").replaceAll(">","&gt;").replaceAll("\"","&;quot;").replaceAll("&","&amp;");

Upvotes: 0

vishal_aim
vishal_aim

Reputation: 7854

do:

a = a.replaceAll("<","&lt;");

instead of :

a.replaceAll("<","&lt;");

and same for others... As replaceAll method doesn't change the string, it rather returns a new one

Upvotes: 1

Related Questions