Remy
Remy

Reputation: 407

itextsharp 5 writer.DirectContent creates 50% smaller file than PDFStamper

I would really like to use the newer approach of using PDFStamper instead of using the older approach of (PdfWriter.GetInstance... writer.DirectContent) but the PDF file created using the older method is 1/2 the size then using the newer approach. Is there something I am missing between the two approaches?

//Old way using PdfWriter.GetInstance... writer.DirectContent
        public static void AddHeaderTextLayer()
        {
            string HdrLeft = string.Empty;
            string HdrRight = string.Empty;
            string PageHdrName = "XHdr";
            string NoOfPagesPadded = string.Empty;
            string PageNoPadded = string.Empty;
            int xLeft = 30;
            int xRight = 100;
            int xTop = 15;
            string filename = "4_20140909.pdf";

            PdfReader reader = new PdfReader(@"C:\!stuff\Junk\ChemWatchPDF\" + filename); // input file

            using (var fileStream = new FileStream(@"C:\!stuff\Junk\ChemWatchPDF\" + filename.Replace(".pdf", "") + "_withHdrLTp.pdf", FileMode.Create, FileAccess.Write))
            {
                var document = new Document(reader.GetPageSizeWithRotation(1));
                var writer = PdfWriter.GetInstance(document, fileStream);
                document.Open();

                for (var i = 1; i <= reader.NumberOfPages; i++)
                {
                    Rectangle pageRect = reader.GetPageSize(i);
                    document.NewPage();

                    var baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
                    var importedPage = writer.GetImportedPage(reader, i);
                    var contentByte = writer.DirectContent;

                    contentByte.AddTemplate(importedPage, 0, 0);
                    string SDSNo = "12345678";
                    HdrLeft = $"Company MSDS# {SDSNo}";

                    NoOfPagesPadded = (reader.NumberOfPages.ToString());
                    PageNoPadded = i.ToString();
                    HdrRight = $" Page {PageNoPadded} of {NoOfPagesPadded}";

                    contentByte.BeginLayer(new PdfLayer(PageHdrName + i.ToString(), writer));

                    contentByte.BeginText();
                    contentByte.SetFontAndSize(baseFont, 10);
                    contentByte.SetColorFill(LabColor.RED);
                    contentByte.ShowTextAligned(PdfContentByte.ALIGN_LEFT, HdrLeft, pageRect.Left + xLeft, pageRect.Top - xTop, 0);
                    contentByte.EndText();

                    contentByte.BeginText();
                    contentByte.SetFontAndSize(baseFont, 10);
                    contentByte.SetColorFill(LabColor.RED);
                    contentByte.ShowTextAligned(PdfContentByte.ALIGN_LEFT, HdrRight, pageRect.Right - xRight, pageRect.Top - xTop, 0);
                    contentByte.EndText();

                    contentByte.EndLayer();
                }
                document.Close();
                writer.Close();
            }
        }

// New way using PDFStamper
       public void Add()
        {
            BaseFont baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, Encoding.ASCII.EncodingName, false);
            string outPutFile = string.Empty;
            var SingleLine = string.Empty;
            string HdrLeft = string.Empty;
            string HdrRight = string.Empty;
            string PageHdrName = "xHdr";
            string NoOfPagesPadded = string.Empty;
            string PageNoPadded = string.Empty;
            int xLeft = 30;
            int xRight = 100;
            int xTop = 15;
            string filename = "4_20140909.pdf";
            outPutFile = @"C:\!stuff\Junk\ChemWatchPDF\" + filename.Replace(".pdf", "") + "_withHdrLTStamp.pdf";

            using (var newPDF = new FileStream(outPutFile, FileMode.Create, FileAccess.ReadWrite))
            {
                PdfReader reader = new PdfReader(@"C:\!stuff\Junk\ChemWatchPDF\" + filename); // input file
                PdfStamper pdfStamper = new PdfStamper(reader, newPDF);
                PdfLayer wmLayer = new PdfLayer(PageHdrName, pdfStamper.Writer);
                for (int page = 1; page <= reader.NumberOfPages; page++)
                {
                    PdfContentByte pdfContent = pdfStamper.GetOverContent(page);
                    Rectangle pageRect = reader.GetPageSize(page);
                    string SDSNo = "12345678";
                    HdrLeft = $"Company MSDS# {SDSNo}";
                    NoOfPagesPadded = (reader.NumberOfPages.ToString());
                    PageNoPadded = page.ToString();

                    HdrRight = $"Page {PageNoPadded} of {NoOfPagesPadded}";
                    pdfContent.BeginLayer(wmLayer);

                    pdfContent.BeginText();
                    pdfContent.SetFontAndSize(baseFont, 10);
                    pdfContent.SetColorFill(LabColor.RED);
                    pdfContent.ShowTextAligned(PdfContentByte.ALIGN_LEFT, HdrLeft, pageRect.Left + xLeft, pageRect.Top - xTop, 0);
                    pdfContent.EndText();

                    pdfContent.BeginText();
                    pdfContent.SetFontAndSize(baseFont, 10);
                    pdfContent.SetColorFill(LabColor.RED);
                    pdfContent.ShowTextAligned(PdfContentByte.ALIGN_LEFT, HdrRight, pageRect.Right - xRight, pageRect.Top - xTop, 0);
                    pdfContent.EndText();

                    pdfContent.EndLayer();
                }
                pdfStamper.Close();
            }
        }
    }
}

Upvotes: 0

Views: 703

Answers (1)

mkl
mkl

Reputation: 95918

Your stamped copy (the output of the "newer approach") contains a structure tree which I assume to come from the original document. It is lost in the output of the "old approach".

The structure tree describes the logical structure of the document. It increases the accessibility of the document and its presence becomes a legal requirement in more and more countries and contexts. Thus, throwing away the structure tree in general is a bad idea.

The structure tree itself consists of very many small indirect objects, in case of your PDF there are more than 1000 indirect objects approximately 90KB in size altogether. Furthermore, each indirect object requires a 20 byte cross reference entry which sums up to nearly 20KB in your case. This explains nearly all of the 111KB difference in size between the two outputs.

If you make use of object streams and cross reference streams, the structure tree can usually be fairly well compressed. Thus, I would propose you activate full compression in iText which makes it use object streams and cross reference streams:

PdfStamper pdfStamper = new PdfStamper(reader, newPDF);
pdfStamper.SetFullCompression();
pdfStamper.Writer.CompressionLevel = 9;

By simply processing your large PDF by a PdfReader/PdfStamper couple with these settings without any other manipulations, I reduced the size of your file from 234KB down to 133KB.


By the way, you call the approach with the PdfWriter and page imports the "old way" and the approach with the PdfStamper the "new way". Actually the PdfStamper class exists in iText at least since 2003! So it's not really old vs. new...

Upvotes: 1

Related Questions