Reputation: 19
I am upgrading some older iTextSharp code to the new iText 7 libraries. I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream only contains the first file. The second PDF file is never present and never concatenated to the first.
I have found multiple ways documented on the Internet as the "proper" way to do the merge. The actual sample code with iText 7 seems to be unusually complex (in that is mixes multiple concepts into one sample repeatedly - as in doesn't reduce the concept to the simplest possible code), and seems to fail to demonstrate simple concepts. For instance, their PDFMerge documentation has no sample code at all in the documentation (nor does anything else I looked at in the class documentation). The examples they have online actually always mix merging from files (not MemoryStreams) with other concepts like adding page numbers or adding Table of Contents. So they never just show one concept and they never start with anything other than files. My PDFs are coming out of a database and we just need to merge them into one PDF memory stream and save it back out. My concern is that maybe I am not creating the MemoryStream properly when I initialize the PDFWriter. As none of their samples ever do anything but initial with an actual file, I was unable to confirm this was done properly. I also fully qualified all objects in the code because I want to leave the old iTextSharp code in place while I am upgrading to the new iText 7. This was done to make sure an iTextSharp object of the same name wasn't inadvertently being unknowingly used.
Also, in the interest of making the source as easy as possible to read I removed some of the declarations and initialization of objects being used. Everything was traced through and all values are fully loaded with proper values as you trace through the code. The only problem is that the second PDFMerge doesn't seem to do anything. I am assuming the problem is that I didn't prepare the PDF objects properly or that I have to do something special with the PDFWriter on the Destination PDF Document (p_pdfDocument) before the second PDF is written out with the PDFMerge object.
Dim p_bResult As Boolean = False
Dim p_bArray As Byte() = Nothing
Dim p_memStream As New System.IO.MemoryStream
Dim p_pdfWriter As New iText.Kernel.Pdf.PdfWriter(p_memStream)
Dim p_pdfDocument As New iText.Kernel.Pdf.PdfDocument(p_pdfWriter)
Dim p_pdf1Stream As New System.IO.MemoryStream(CType(p_cImage1.ImageFile, Byte()))
Dim p_pdf2Stream As New System.IO.MemoryStream(CType(p_cImage2.ImageFile, Byte()))
Dim p_pdf1Reader As New iText.Kernel.Pdf.PdfReader(p_pdf1Stream)
Dim p_pdf2Reader As New iText.Kernel.Pdf.PdfReader(p_pdf2Stream)
Dim p_pdf1Document As New iText.Kernel.Pdf.PdfDocument(p_pdf1Reader)
Dim p_pdf2Document As New iText.Kernel.Pdf.PdfDocument(p_pdf2Reader)
Dim p_pdfMerger As New iText.Kernel.Utils.PdfMerger(p_pdfDocument)
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
'Problem is here... the array only has the first PDF in it
'The second p_pdfMerger.Merge didn't seem to do anything
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
p_pdfDocument.Close()
I expected the 2 source PDF MemoryStreams to be present in the destination MemoryStream but it only contains the first PDF in it.
Edit:
I changed the ending to...
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
p_cImage1.PageCount = p_pdfDocument.GetNumberOfPages()
p_pdfDocument.Close()
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
Thing is that the p_pdfDocument.GetNumberOfPages()
is correct but bytes are still just first PDF document when saved to database and viewed.
Upvotes: 0
Views: 1302
Reputation: 95888
I tested your use case, condensing your code a bit, reading the input memory streams from files, and writing the output memory stream to a file as I don't have your database environment:
Using MemoryStream As New MemoryStream,
Pdf1MemoryStream As New MemoryStream(File.ReadAllBytes(MY_FIRST_PDF_FILE)),
Pdf2MemoryStream As New MemoryStream(File.ReadAllBytes(MY_SECOND_PDF_FILE))
Using PdfDocument As New PdfDocument(New PdfWriter(MemoryStream)),
Pdf1 As New PdfDocument(New PdfReader(Pdf1MemoryStream)),
Pdf2 As New PdfDocument(New PdfReader(Pdf2MemoryStream))
Dim Merger As New PdfMerger(PdfDocument)
Merger.Merge(Pdf1, 1, Pdf1.GetNumberOfPages)
Merger.Merge(Pdf2, 1, Pdf2.GetNumberOfPages)
End Using
Dim PdfBytes As Byte() = MemoryStream.ToArray()
Using FileStream As Stream = File.Create("TwoPdfsMergedInMemoryStream.pdf")
FileStream.Write(PdfBytes, 0, PdfBytes.Length)
End Using
End Using
As result I got the contents of both source files in TwoPdfsMergedInMemoryStream.pdf
as it should be. Concerning your observation
Thing is that the
p_pdfDocument.GetNumberOfPages()
is correct but bytes are still just first PDF document when saved to database and viewed.
therefore, I would assume that p_bArray
does contain a PDF with the contents of both your source PDFs but that there is an issue in saving to database or viewing.
To test this you could save the contents of the byte array to a file somewhere like I do above; then you can check what really is in the array.
Upvotes: 0