Reputation: 764

VB.Net Merge multiple pdfs into one and export

I have to merge multiple PDFs into a single PDF.

I am using the iText.sharp library, and collect converted the code and tried to use it (from here) The actual code is in C# and I converted that to VB.NET.

 Private Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
    Dim mergedPdf As Byte() = Nothing
    Using ms As New MemoryStream()
        Using document As New Document()
            Using copy As New PdfCopy(document, ms)
                document.Open()
                For i As Integer = 0 To sourceFiles.Count - 1
                    Dim reader As New PdfReader(sourceFiles(i))
                    ' loop over the pages in that document
                    Dim n As Integer = reader.NumberOfPages
                    Dim page As Integer = 0
                    While page < n
                        page = page + 1
                        copy.AddPage(copy.GetImportedPage(reader, page))
                    End While
                Next
            End Using
        End Using
        mergedPdf = ms.ToArray()
    End Using
End Function

I am now getting the following error:

An item with the same key has already been added.

I did some debugging and have tracked the problem down to the following lines:

copy.AddPage(copy.GetImportedPage(reader,
copy.AddPage(copy.GetImportedPage(reader, page)))

Why is this error happening?

Upvotes: 5

Answers (4)

Gerald Leesmann

Reputation: 29

the code that was marked correct does not close all the file streams therefore the files stay open within the app and you wont be able to delete unused PDFs within your project

This is a better solution:

Public Sub MergePDFFiles(ByVal outPutPDF As String) 

    Dim StartPath As String = FileArray(0) ' this is a List Array declared Globally
    Dim document = New Document()
    Dim outFile = Path.Combine(outPutPDF)' The outPutPDF varable is passed from another sub this is the output path
    Dim writer = New PdfCopy(document, New FileStream(outFile, FileMode.Create))

    Try

        document.Open()
        For Each fileName As String In FileArray

            Dim reader = New PdfReader(Path.Combine(StartPath, fileName))

            For i As Integer = 1 To reader.NumberOfPages

                Dim page = writer.GetImportedPage(reader, i)
                writer.AddPage(page)

            Next i

            reader.Close()

        Next

        writer.Close()
        document.Close()

    Catch ex As Exception
        'catch a Exception if needed

    Finally

        writer.Close()
        document.Close()

    End Try


End Sub

Upvotes: 1

G_Hosa_Phat

Reputation: 1110

I realize I'm pretty late to the party, but after reading the comments from @BrunoLowagie, I wanted to see if I could put something together myself that uses the examples from his linked sample chapter. It's probably overkill, but I put together some code that merges multiple PDFs into a single file that I posted on the Code Review SE site (the post, VB.NET - Error Handling in Generic Class for PDF Merge, contains the full class code). It only merges PDF files right now, but I'm planning on adding methods for additional functionality later.

The "master" method (towards the end of the Class block in the linked post, and also posted below for reference) handles the actual merging of the PDF files, but the multiple overloads provide a number of options for how to define the list of original files. So far, I've included the following features:

The methods return a System.IO.FileInfo object if the merge is successful.
Provide a System.IO.DirectoryInfo object or a System.String identifying a path and it will collect all PDF files in that directory (including sub-directories if specified) to merge.
Provide a List(Of System.String) or a List(Of System.IO.FileInfo) specifying the PDFs you want to merge.
Identify how the PDFs should be sorted before the merge (especially useful if you use one of the MergeAll methods to get all PDF files in a directory).
If the specified output PDF file already exists, you can specify whether or not you want to overwrite it. (I'm considering adding the "ability" to automatically adjust the output PDF file's name if it already exists).
Warning and Error properties provide a way to get feedback in the calling method, whether or not the merge is successful.

Once the code is in place, it can be used like this:

Dim PDFDir As New IO.DirectoryInfo("C:\Test Data\PDF\")
Dim ResultFile As IO.FileInfo = Nothing
Dim Merger As New PDFManipulator

ResultFile = Merger.MergeAll(PDFDir, "C:\Test Data\PDF\Merged.pdf", True, PDFManipulator.PDFMergeSortOrder.FileName, True)

Here is the "master" method. As I said, it's probably overkill (and I'm still tweaking it some), but I wanted to do my best to try to make it work as effectively as possible. Obviously it requires a Reference to the itextsharp.dll for access to the library's functions.

I've commented out the references to the Error and Warning properties of the class for this post to help reduce any confusion.

Public Function Merge(ByVal PDFFiles As List(Of System.IO.FileInfo), ByVal OutputFileName As String, ByVal OverwriteExistingPDF As Boolean, ByVal SortOrder As PDFMergeSortOrder) As System.IO.FileInfo
    Dim ResultFile As System.IO.FileInfo = Nothing
    Dim ContinueMerge As Boolean = True

    If OverwriteExistingPDF Then
        If System.IO.File.Exists(OutputFileName) Then
            Try
                System.IO.File.Delete(OutputFileName)
            Catch ex As Exception
                ContinueMerge = False

                'If Errors Is Nothing Then
                '    Errors = New List(Of String)
                'End If

                'Errors.Add("Could not delete existing output file.")

                Throw
            End Try
        End If
    End If

    If ContinueMerge Then
        Dim OutputPDF As iTextSharp.text.Document = Nothing
        Dim Copier As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim PDFStream As System.IO.FileStream = Nothing
        Dim SortedList As New List(Of System.IO.FileInfo)

        Try
            Select Case SortOrder
                Case PDFMergeSortOrder.Original
                    SortedList = PDFFiles
                Case PDFMergeSortOrder.FileDate
                    SortedList = PDFFiles.OrderBy(Function(f As System.IO.FileInfo) f.LastWriteTime).ToList
                Case PDFMergeSortOrder.FileName
                    SortedList = PDFFiles.OrderBy(Function(f As System.IO.FileInfo) f.Name).ToList
                Case PDFMergeSortOrder.FileNameWithDirectory
                    SortedList = PDFFiles.OrderBy(Function(f As System.IO.FileInfo) f.FullName).ToList
            End Select

            If Not IO.Directory.Exists(New IO.FileInfo(OutputFileName).DirectoryName) Then
                Try
                    IO.Directory.CreateDirectory(New IO.FileInfo(OutputFileName).DirectoryName)
                Catch ex As Exception
                    ContinueMerge = False

                    'If Errors Is Nothing Then
                    '    Errors = New List(Of String)
                    'End If

                    'Errors.Add("Could not create output directory.")

                    Throw
                End Try
            End If

            If ContinueMerge Then
                OutputPDF = New iTextSharp.text.Document
                PDFStream = New System.IO.FileStream(OutputFileName, System.IO.FileMode.OpenOrCreate)
                Copier = New iTextSharp.text.pdf.PdfCopy(OutputPDF, PDFStream)

                OutputPDF.Open()

                For Each PDF As System.IO.FileInfo In SortedList
                    If ContinueMerge Then
                        Dim InputReader As iTextSharp.text.pdf.PdfReader = Nothing

                        Try
                            InputReader = New iTextSharp.text.pdf.PdfReader(PDF.FullName)

                            For page As Integer = 1 To InputReader.NumberOfPages
                                Copier.AddPage(Copier.GetImportedPage(InputReader, page))
                            Next page

                            If InputReader.IsRebuilt Then
                                'If Warnings Is Nothing Then
                                '    Warnings = New List(Of String)
                                'End If

                                'Warnings.Add("Damaged PDF: " & PDF.FullName & " repaired and successfully merged into output file.")
                            End If
                        Catch InvalidEx As iTextSharp.text.exceptions.InvalidPdfException
                            'Skip this file
                            'If Errors Is Nothing Then
                            '    Errors = New List(Of String)
                            'End If

                            'Errors.Add("Invalid PDF: " & PDF.FullName & " not merged into output file.")
                        Catch FormatEx As iTextSharp.text.pdf.BadPdfFormatException
                            'Skip this file
                            'If Errors Is Nothing Then
                            '    Errors = New List(Of String)
                            'End If

                            'Errors.Add("Bad PDF Format: " & PDF.FullName & " not merged into output file.")
                        Catch PassworddEx As iTextSharp.text.exceptions.BadPasswordException
                            'Skip this file
                            'If Errors Is Nothing Then
                            '    Errors = New List(Of String)
                            'End If

                            'Errors.Add("Password-protected PDF: " & PDF.FullName & " not merged into output file.")
                        Catch OtherEx As Exception
                            ContinueMerge = False
                        Finally
                            If Not InputReader Is Nothing Then
                                InputReader.Close()
                                InputReader.Dispose()
                            End If
                        End Try
                    End If
                Next PDF
            End If
        Catch ex As iTextSharp.text.pdf.PdfException
            ResultFile = Nothing
            ContinueMerge = False

            'If Errors Is Nothing Then
            '    Errors = New List(Of String)
            'End If

            'Errors.Add("iTextSharp Error: " & ex.Message)

            If System.IO.File.Exists(OutputFileName) Then
                If Not OutputPDF Is Nothing Then
                    OutputPDF.Close()
                    OutputPDF.Dispose()
                End If

                If Not PDFStream Is Nothing Then
                    PDFStream.Close()
                    PDFStream.Dispose()
                End If

                If Not Copier Is Nothing Then
                    Copier.Close()
                    Copier.Dispose()
                End If

                System.IO.File.Delete(OutputFileName)
            End If

            Throw
        Catch other As Exception
            ResultFile = Nothing
            ContinueMerge = False

            'If Errors Is Nothing Then
            '    Errors = New List(Of String)
            'End If

            'Errors.Add("General Error: " & other.Message)

            If System.IO.File.Exists(OutputFileName) Then
                If Not OutputPDF Is Nothing Then
                    OutputPDF.Close()
                    OutputPDF.Dispose()
                End If

                If Not PDFStream Is Nothing Then
                    PDFStream.Close()
                    PDFStream.Dispose()
                End If

                If Not Copier Is Nothing Then
                    Copier.Close()
                    Copier.Dispose()
                End If

                System.IO.File.Delete(OutputFileName)
            End If

            Throw
        Finally
            If Not OutputPDF Is Nothing Then
                OutputPDF.Close()
                OutputPDF.Dispose()
            End If

            If Not PDFStream Is Nothing Then
                PDFStream.Close()
                PDFStream.Dispose()
            End If

            If Not Copier Is Nothing Then
                Copier.Close()
                Copier.Dispose()
            End If

            If System.IO.File.Exists(OutputFileName) Then
                If ContinueMerge Then
                    ResultFile = New System.IO.FileInfo(OutputFileName)

                    If ResultFile.Length <= 0 Then
                        ResultFile = Nothing

                        Try
                            System.IO.File.Delete(OutputFileName)
                        Catch ex As Exception
                            Throw
                        End Try
                    End If
                Else
                    ResultFile = Nothing

                    Try
                        System.IO.File.Delete(OutputFileName)
                    Catch ex As Exception
                        Throw
                    End Try
                End If
            Else
                ResultFile = Nothing
            End If
        End Try
    End If

    Return ResultFile
End Function

Upvotes: 0

Coder999

Reputation: 1

Some may have to make a change to the code at "writer = PdfWriter.GetInstance(pdfDoc, New FileStream(outputPath, FileMode.OpenOrCreate))" as iTextSharp may not support

Change to:

Dim fs As IO.FileStream = New IO.FileStream(outputPath, IO.FileMode.Create)

writer = iTextSharp.text.pdf.PdfWriter.GetInstance(pdfDoc, fs)

Upvotes: -2

Sean Wessell

Reputation: 3510

I have a console that monitors individual folders in a designated folder then needs to merge all of the pdf's in that folder into a single pdf. I pass an array of file paths as strings and the output file i would like.

This is the function i use.

Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String) As Boolean
    Dim result As Boolean = False
    Dim pdfCount As Integer = 0     'total input pdf file count
    Dim f As Integer = 0    'pointer to current input pdf file
    Dim fileName As String
    Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    Dim pageCount As Integer = 0
    Dim pdfDoc As iTextSharp.text.Document = Nothing    'the output pdf document
    Dim writer As PdfWriter = Nothing
    Dim cb As PdfContentByte = Nothing

    Dim page As PdfImportedPage = Nothing
    Dim rotation As Integer = 0

    Try
        pdfCount = pdfFiles.Length
        If pdfCount > 1 Then
            'Open the 1st item in the array PDFFiles
            fileName = pdfFiles(f)
            reader = New iTextSharp.text.pdf.PdfReader(fileName)
            'Get page count
            pageCount = reader.NumberOfPages

            pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)

            writer = PdfWriter.GetInstance(pdfDoc, New FileStream(outputPath, FileMode.OpenOrCreate))


            With pdfDoc
                .Open()
            End With
            'Instantiate a PdfContentByte object
            cb = writer.DirectContent
            'Now loop thru the input pdfs
            While f < pdfCount
                'Declare a page counter variable
                Dim i As Integer = 0
                'Loop thru the current input pdf's pages starting at page 1
                While i < pageCount
                    i += 1
                    'Get the input page size
                    pdfDoc.SetPageSize(reader.GetPageSizeWithRotation(i))
                    'Create a new page on the output document
                    pdfDoc.NewPage()
                    'If it is the 1st page, we add bookmarks to the page
                    'Now we get the imported page
                    page = writer.GetImportedPage(reader, i)
                    'Read the imported page's rotation
                    rotation = reader.GetPageRotation(i)
                    'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
                    If rotation = 90 Then
                        cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
                    ElseIf rotation = 270 Then
                        cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
                    Else
                        cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
                    End If
                End While
                'Increment f and read the next input pdf file
                f += 1
                If f < pdfCount Then
                    fileName = pdfFiles(f)
                    reader = New iTextSharp.text.pdf.PdfReader(fileName)
                    pageCount = reader.NumberOfPages
                End If
            End While
            'When all done, we close the document so that the pdfwriter object can write it to the output file
            pdfDoc.Close()
            result = True
        End If
    Catch ex As Exception
        Return False
    End Try
    Return result
End Function

Upvotes: 5

VB.Net Merge multiple pdfs into one and export

Answers (4)

Related Questions