iTextSharp XMLWorkerHelper and Images for HTML to PDF

Question

Bottom line is I'm using iTextSharp to write out HTML to a PDF -- with an image. Right now, I'm at the latest version of iTextSharp which is 5.5.5.0. I have access to Bruno's book, and I'm using the methodology spelled out by demo.iTextSupport.com for the conversion. Unfortunately, the book doesn't appear to have any reference to XMLWorkerHelper, which is what I'm using to create the PDF from the HTML.

Here's the method I finally got working that successfully generates a PDF from a well-formed HTML string:

private string createPDFFromHtml(string htmlString, string outputFileName)
{
    string result = string.Empty;

    try
    {
        if (!string.IsNullOrEmpty(htmlString) && !string.IsNullOrEmpty(outputFileName) && !File.Exists(outputFileName))
        {
            using (FileStream fos = new FileStream(outputFileName, FileMode.Create))
            {
                using (MemoryStream inputMemoryStream = new MemoryStream(Encoding.ASCII.GetBytes(htmlString)))
                {
                    using (TextReader textReader = new StreamReader(inputMemoryStream, Encoding.ASCII))
                    {
                        using (Document pdfDoc = new Document())
                        {
                            using (PdfWriter pdfWriter = PdfWriter.GetInstance(pdfDoc, fos))
                            {
                                XMLWorkerHelper helper = XMLWorkerHelper.GetInstance();
                                pdfDoc.Open();
                                helper.ParseXHtml(pdfWriter, pdfDoc, textReader);
                                result = "Successfully Created new HTML--> PDF Document!";
                                pdfWriter.CloseStream = false;
                            }
                        }
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        result = "Exception: " + ex.Message;
    }

    return result;
}

This works, and what I'd like to do is create a letter with an image for letterhead, and the image is just some JPG that I have laying around on my hard drive somewhere.

Here's what I've tried, but while it successfully plops the image exactly where I want and how I want, the rest of the PDF has severely truncated output.

 private string createPDFFromHtmlWithImage(string htmlString, string outputFileName, string headerImagePath)
        {
            string result = string.Empty;

            try
            {
                if (!string.IsNullOrEmpty(htmlString) && !string.IsNullOrEmpty(outputFileName) && !File.Exists(outputFileName))
                {
                    using (FileStream fos = new FileStream(outputFileName, FileMode.Create))
                    {
                        using (MemoryStream inputMemoryStream = new MemoryStream(Encoding.ASCII.GetBytes(htmlString)))
                        {
                            using (TextReader textReader = new StreamReader(inputMemoryStream, Encoding.ASCII))
                            {
                                using (Document pdfDoc = new Document())
                                {
                                    using (PdfWriter pdfWriter = PdfWriter.GetInstance(pdfDoc, fos))
                                    {
                                        pdfDoc.Open();
                                        Image img = Image.GetInstance(headerImagePath);
                                        if (img != null)
                                        {
                                            img.ScaleToFit(540f, 300f);
                                            pdfDoc.Add(img);
                                        }

                                        XMLWorkerHelper helper = XMLWorkerHelper.GetInstance();
                                        helper.ParseXHtml(pdfWriter, pdfDoc, textReader);

                                        result = "Successfully Created new HTML--> PDF Document!";
                                        pdfWriter.CloseStream = false;
                                    }
                                }
                            }
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                result = "Exception: " + ex.Message;
            }

            return result;
        }

The results are that the PDF has the image I want and then basically the first of my HTML (but even that DIV isn't completely shown), then nothing else.

So, I figured I needed to probably not just blast the textReader into the pdfDoc, but maybe do some "adds" of some sort.

And...here's where I'm getting lost.

I'm thinking I still need to use the XMLWorkerHelper, but I need to do something with IElementHandler rather than just shoving the whole thing into a pdfWriter.

Additional research shows that I can possibly do some tricks with IElements via Chris Haas wonderful post here.

So, I make my own IElementHandler like Chris shows (except I do things the long way, please bear with me):

public class HtmlElementHandler : IElementHandler
{
    public List elementList = new List();

    public void Add(IWritable e)
    {
        if (e != null && e is WritableElement)
        {
            WritableElement we = e as WritableElement;

            if (we != null)
            {
                IList weList = we.Elements();
                if (weList.Any())
                {
                    elementList.AddRange(weList);
                }
            }
        }
    }
}

Now using this code:

 private string createPDFFromHtmlWithImageElemental(string htmlString, string outputFileName, string headerImagePath)
        {
            string result = string.Empty;

            try
            {
                if (!string.IsNullOrEmpty(htmlString) && !string.IsNullOrEmpty(outputFileName) && !File.Exists(outputFileName))
                {
                    using (FileStream fos = new FileStream(outputFileName, FileMode.Create))
                    {
                        using (MemoryStream inputMemoryStream = new MemoryStream(Encoding.ASCII.GetBytes(htmlString)))
                        {
                            using (TextReader textReader = new StreamReader(inputMemoryStream, Encoding.ASCII))
                            {
                                using (Document pdfDoc = new Document())
                                {
                                    using (PdfWriter pdfWriter = PdfWriter.GetInstance(pdfDoc, fos))
                                    {
                                        pdfDoc.Open();
                                        Image img = Image.GetInstance(headerImagePath);
                                        if (img != null)
                                        {
                                            img.ScaleToFit(540f, 300f);
                                            pdfDoc.Add(img);
                                        }

                                        HtmlElementHandler htmlElementHandler = new HtmlElementHandler();

                                        XMLWorkerHelper helper = XMLWorkerHelper.GetInstance();
                                        helper.ParseXHtml(htmlElementHandler, inputMemoryStream, Encoding.ASCII);

                                        foreach (IElement ielement in htmlElementHandler.elementList)
                                        {
                                            pdfDoc.Add(ielement);
                                        }

                                        result = "Successfully Created new HTML--> PDF Document!";
                                        pdfWriter.CloseStream = false;
                                    }
                                }
                            }
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                result = "Exception: " + ex.Message;
            }

            return result;
        }

I get the same exact results as just plopping the whole thing into the pdfDoc like before.

I can see that my element is actually a iTextShartp.text.pdf.PdfDiv with content, maybe I could do something with that, but I'm really not much of an expert here and I feel like I'm going down the rabbit hole without Alice to guide me.

Additional searching indicates there is a way to get an image embedded, but I'm not all that keen on generating the binary-as-text image string for my image and loading it into the HTML like this solution does. I'd like to be able to choose and change images as needed. I guess I could create a way to take an image, create this binary-text, and insert it into my HTML, but I'd rather see if there is another solution first.

So, you can see what I've tried. I'd appreciate any other help you can provide.

Bruno Lowagie · Accepted Answer

XML Worker isn't mentioned in the book, because the book was written in 2009 and the development on XML Worker started somewhere in 2011. Your question is very long, yet it is missing an important element: an HTML sample like the one's provided for the sandbox examples (which you don't mention). For instance: when the parse the thoreau.html example using ParseHtmlImagesLinksOops, we lose all images: thoreau_oops.pdf; when we use ParseHtmlImagesLinks, we use an ImageProvider that makes sure we get the correct paths to the images and the result looks quite OK: thoreau.pdf (so do the links, by the way).

However, when I look at the actual requirement, I see that you want to create a letter with an image for letterhead. In that case, I would use page events to add company stationary to each page. How to do that is explained in the book.

iTextSharp XMLWorkerHelper and Images for HTML to PDF

Answers (1)

Related Questions