Trouble implementing iTextSharp to convert HTML to PDF

Question

Edit: Perfect solution was proposed below (streams closed in wrong order). I ended up going with an open-source alternative of PreMailer.Net + HtmlAgilityPack + wkHTMLtoPDF as it better fit my needs.

I am attempting to implemnt iTextSharp in C# to convert HTML to a PDF file, including converting relative URI's for Links and Images. I have a very basic implementation of "Changing the Default Configuration"(http://demo.itextsupport.com/xmlworker/itextdoc/flatsite.html), converted from Java to C#, to try things out. However, the sample HTML (which I have tested) which I feed into my script returns the following contents in the PDF I created when edited via a text editor:

%PDF-1.4
%âãÏÓ

This seems wrong. Also, the MemoryStream has a very small number of bytes associated with it. Is something wrong with my implementation of iTextSharp, or am I using streams or other C# constructs incorrectly?

using System.IO;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml.html;
using iTextSharp.tool.xml.pipeline.html;
using iTextSharp.tool.xml;
using iTextSharp.tool.xml.parser;
using iTextSharp.tool.xml.pipeline.css;
using iTextSharp.tool.xml.pipeline.end; 

class Program
{
    static void Main(string[] args)
    {
        FontFactory.RegisterDirectories();
        var document = new Document();
        var memoryStream = new MemoryStream();
        var pdfWriter = PdfWriter.GetInstance(document, memoryStream );
        document.Open();

        var htmlContext = new HtmlPipelineContext(null);
        htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
        htmlContext.SetImageProvider(new ImageProvider());
        htmlContext.SetLinkProvider(new LinkProvider());
        htmlContext.CharSet(Encoding.UTF8);

        var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
        var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, pdfWriter)));
        var xmlWorker = new XMLWorker(pipeline, true);
        var xmlParser = new XMLParser(xmlWorker);

        var inputFileStream = new FileStream("testHTML.html", FileMode.Open);
        xmlParser.Parse(inputFileStream);
        inputFileStream.Close();

        memoryStream.Position = 0;
        pdfWriter.CloseStream = false;

        var outputFileStream = new FileStream("testOutput.pdf", FileMode.Create, FileAccess.Write);
        memoryStream.WriteTo(outputFileStream);

        outputFileStream.Close();
        document.Close();
    }
}

class ImageProvider : AbstractImageProvider
{
    public override string GetImageRootPath()
    {
        return "testDir/";
    }
}

class LinkProvider : ILinkProvider
{
    public string GetLinkRoot()
    {
        return "http://www.examplesite.com/testdir/";
    }
}

Thanks so much for your time and help!

mkl · Accepted Answer

You grab the contents of the memory stream before closing the iText document:

    memoryStream.WriteTo(outputFileStream);

    outputFileStream.Close();
    document.Close();

But only when closing the document, iText completes the output PDF, in particular flushing the contents of the current last page and adding cross references etc.

Thus, change your code

    memoryStream.Position = 0;
    pdfWriter.CloseStream = false;

    var outputFileStream = new FileStream("testOutput.pdf", FileMode.Create, FileAccess.Write);
    memoryStream.WriteTo(outputFileStream);

    outputFileStream.Close();
    document.Close();

to this

    pdfWriter.CloseStream = false;
    document.Close();

    var outputFileStream = new FileStream("testOutput.pdf", FileMode.Create, FileAccess.Write);
    memoryStream.Position = 0;
    memoryStream.WriteTo(outputFileStream);
    outputFileStream.Close();

Trouble implementing iTextSharp to convert HTML to PDF

Answers (1)

Related Questions