Wayne
Wayne

Reputation: 6449

SAX Error when read not well-formed XML

I have a XML file which contains a part like below. img and br are not tags but when parsing, SAX considers img and br are tags, so because they don't have close tag, SAX raised error. How do i overcome this, how to ignore img and br when parsing. Thanks you!

<summary xml:base="http://www.dailymail.co.uk/health/index.html?ITO=1490" xml:lang="en-GB" type="html">
    <img src="http://i.dailymail.co.uk/i/pix/2011/10/30/article-2055372-01A8032A0000044D-515_87x84.jpg" width="87" height="84"><br>Millions take statins to combat heart disease by lowering cholesterol, but research suggests that high cholesterol could be a key factor in the development of breast cancer.
</summary>

Upvotes: 1

Views: 1531

Answers (3)

G_H
G_H

Reputation: 12009

That is not well-formed XML. In XML, every element must be closed, either with a closing tag (<br>...</br>) or implicity as an empty tag (<br/>). If some markup characters are required as text, then either they should be embedded in a CDATA section...

<![CDATA[This is my <em>character</em> data, not markup.]]>

... or by using character entity references:

This is my &lt;em&gt;character&lt;/em&gt; data, not markup.

SAX has no way of knowing that some markup should be considered XML and other not just because they're HTML elements. If it sees <br>, it's gonna assume that starts a br element and a corresponding closing tag is going to be encountered later.

Upvotes: 1

shift66
shift66

Reputation: 11958

Tags must be closed.try <br/> and also add slash ( '/' ) symbol before img tag ends like this
<img src="path"/>
I've tried,it worked ;-)

Upvotes: 1

Florian Patzl
Florian Patzl

Reputation: 333

I think this XML is invalid - every parser will try to parse the img and br tags in that XML.
They should be surrounded by a CDATA tag so that they are not parsed:
http://www.w3schools.com/xml/xml_cdata.asp

Upvotes: 1

Related Questions