Nearly all browsers use a certain amount of leeway in rendering invalid HTML. For example, they would render x < y as if it were written x < y because it is "clear" that the < is intended as a literal character, not part of an HTML tag. Where can I find that logic as a separate "cleanup" module? Such a module would convert x < y to x < y

Reputation: 38714

invalid HTML rendering logic

Nearly all browsers use a certain amount of leeway in rendering invalid HTML. For example, they would render x < y as if it were written x < y because it is "clear" that the < is intended as a literal character, not part of an HTML tag.

Where can I find that logic as a separate "cleanup" module? Such a module would convert x < y to x < y

Upvotes: 2

Answers (5)

Vivin Paliath

Reputation: 95548

Try looking at the source code for Tidy.

HTML before running through Tidy:

<html>

 <head>
  <title>boo</title>
 </head>

 <body>
   x < y
 </body>

</html>

Same HTML after running through Tidy:

<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">

  <title>boo</title>
</head>

<body>
  x &lt; y
</body>
</html>

Notice that x < y was changed to x < y.

UPDATE

Based on your comment, you should probably use Tidy to clean up your HTML. I believe there are Tidy libraries for most of the common languages, that will clean up your HTML for you. If you are using PHP, there is PHP Tidy.

UPDATE

I noticed that you said you're using C#. You can use Tidy with C# as well. Here's something I found. I don't develop in C# and I haven't tried this out so YMMV:

Fix Up Your HTML with HTML Tidy and .NET

Upvotes: 3

Quentin

Reputation: 943759

The HTML 5 (draft) specification includes a detailed parsing algorithm based on how browsers handle bad markup.

Upvotes: 0

Mike Caron

Reputation: 14561

Edit: I am assuming you're using PHP, since you didn't specify

Use strip_tags:

$content = strip_tags($content, array('<b><i>'));

This will leave safe tags (as defined by you), and remove everything else.

Upvotes: -1

You

Reputation: 23794

Rendering of invalid HTML in browsers is horrible guesswork, and you really shouldn't try to emulate it (it will break). However, replacing some occurrences could be done with a regexp:

preg_replace('/(\s)<(\s)/', '$1&lt;$2', $data);

Upvotes: 0

aletzo

Reputation: 2456

Not sure what do you mean exactly, but maybe the PHP function htmlentities could help you.

Upvotes: 0

invalid HTML rendering logic

Answers (5)

Related Questions