al.
al.

Reputation: 222

SAX - HTML attribute with no value

I am currently using SAX to parse some HTML. However, I now have to a parse a document that has something like this:

`<OPTION VALUE="123" SELECTED>`

and because SELECTED does not have an actual value set, it is throwing an error (not well-formed, invalid token). Is there a way to resolve this so I can keep using SAX?

My code:

        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xr = sp.getXMLReader();

        xr.setContentHandler(sch);
        InputSource is = new InputSource(Statics.SUBJECT_CODE_URL);
        xr.parse(is);

Upvotes: 0

Views: 191

Answers (2)

Jim Garrison
Jim Garrison

Reputation: 86774

You can't use SAX to parse HTML. HTML is not XML. A perfectly valid HTML document is NOT a valid XML document, and nothing you can do will make an XML parser parse it.

Upvotes: 1

Konstantin Yovkov
Konstantin Yovkov

Reputation: 62884

With SAX you could parse XHTML, but you cannot parse HTML with a great success, because HTML is not a well-formed XML.

Upvotes: 0

Related Questions