dealing with itextsharp XMLWorkerHelper.ParseXHTML strict behavior

Question

While trying to use XMLWorkerHelper.GetInstance().ParseXHTML() i find that it is really strict. Any wrong order of tags or unclosed tags will cause it to throw exception.

I am converting HTML that I have no control over.

Are there any flags to make it less strict? An input callback interface to handle funny markup? Anything in the itextsharp.tools.xml.html? Or an entirely new library compatible with itextsharp.text.IElement?

Chris Haas · Accepted Answer

The name of the class and that method pretty much sums it up - you can't. The entire pipeline is based on the assumption that a valid XML document will be passed in, everything else will throw an exception. You can customize the pipeline and add your own handlers for things like link resolution, custom CSS properties and new HTML tags, but the core document processor still needs valid HTML.

I would recommend looking into running your HTML through a library that can convert it to XHTML.

EDIT

Also check out wkhtmltopdf. It uses webkit to render HTML and does (apparently) a pretty good job.

dealing with itextsharp XMLWorkerHelper.ParseXHTML strict behavior

Answers (1)

Related Questions