Mitch
Mitch

Reputation: 669

iTextSharp 5.5.6 PdfCopy Failing with "Cannot access a closed file"

This seems to be similar to this question: Merging Tagged PDF without ruining the tags

I'm using the latest iTextSharp NuGet package (v5.5.6) trying to merge two tagged PDFs. When calling Document.Close() I'm getting an ObjectDisposedException originating from PdfCopy.FlushIndirectObjects().

at System.IO.__Error.FileNotOpen()
at System.IO.FileStream.get_Position()
at iTextSharp.text.io.RAFRandomAccessSource.Get(Int64 position, Byte[] bytes, Int32 off, Int32 len) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\io\RAFRandomAccessSource.cs:line 96
at iTextSharp.text.io.IndependentRandomAccessSource.Get(Int64 position, Byte[] bytes, Int32 off, Int32 len) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\io\IndependentRandomAccessSource.cs:line 76
at iTextSharp.text.pdf.RandomAccessFileOrArray.Read(Byte[] b, Int32 off, Int32 len) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\RandomAccessFileOrArray.cs:line 235
at iTextSharp.text.pdf.RandomAccessFileOrArray.ReadFully(Byte[] b, Int32 off, Int32 len) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\RandomAccessFileOrArray.cs:line 264
at iTextSharp.text.pdf.RandomAccessFileOrArray.ReadFully(Byte[] b) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\RandomAccessFileOrArray.cs:line 254
at iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(PRStream stream, RandomAccessFileOrArray file) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfReader.cs:line 2406
at iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(PRStream stream) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfReader.cs:line 2443
at iTextSharp.text.pdf.PRStream.ToPdf(PdfWriter writer, Stream os) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PRStream.cs:line 224
at iTextSharp.text.pdf.PdfIndirectObject.WriteTo(Stream os) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfIndirectObject.cs:line 157
at iTextSharp.text.pdf.PdfWriter.PdfBody.Write(PdfIndirectObject indirect, Int32 refNumber, Int32 generation) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfWriter.cs:line 389
at iTextSharp.text.pdf.PdfWriter.PdfBody.Add(PdfObject objecta, Int32 refNumber, Int32 generation, Boolean inObjStm) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfWriter.cs:line 379
at iTextSharp.text.pdf.PdfCopy.WriteObjectToBody(PdfIndirectObject objecta) in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfCopy.cs:line 1238
at iTextSharp.text.pdf.PdfCopy.FlushIndirectObjects() in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfCopy.cs:line 1186
at iTextSharp.text.pdf.PdfCopy.FlushTaggedObjects() in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfCopy.cs:line 884
at iTextSharp.text.pdf.PdfDocument.Close() in d:\Downloads\itextsharp-master\src\core\iTextSharp\text\pdf\PdfDocument.cs:line 825

Here is the code that is producing the exception. If I don't call copy.SetTagged() and don't pass true as the third argument to GetImportedPage() the code executes without exception, but ignores all tagging.

using(var ms = new MemoryStream())
{
    var doc = new Document();
    var copy = new PdfSmartCopy(doc, ms);
    copy.SetTagged();
    doc.Open();

    string[] files = new string[]{@"d:\tagged.pdf", @"d:\tagged.pdf"};
    foreach(var f in files)
    {
        var reader = new PdfReader(f);
        int pages = reader.NumberOfPages;
        for(int i = 0; i < pages;)
            copy.AddPage(copy.GetImportedPage(reader, ++i, true));
        copy.FreeReader(reader);
        reader.Close();
    }

    // ObjectDisposedException
    doc.Close();

    ms.Flush();
    File.WriteAllBytes(@"d:\pdf.merged.v5.pdf", ms.ToArray());
}

Looking at the 5.5.6 source branch it looks like RAFRandomAccessSource.cs line 96 is the culprit.

public virtual int Get(long position, byte[] bytes, int off, int len) {
   if (position > length)
      return -1;

   // Not thread safe!
   if (raf.Position != position)

raf.Position has been disposed at this point, but I can't tell from where it has been disposed.

I'm hoping that I just need to do something more than simply call copy.SetTagged() and pass true to GetImportedPage() to fix the issue.

Upvotes: 2

Views: 3388

Answers (1)

Bruno Lowagie
Bruno Lowagie

Reputation: 77546

You are closing the PdfReader instances too early. You can only trigger:

reader.Close();

after you close the PdfSmartCopy instance, hence you have to rethink where you create the different PdfReader objects (not inside the loop).

The reason why the different PdfReader instances have to remain open is purely technical: merging structured trees (where all the tagging information is stored) isn't trivial. This can only happen at the moment all the other work is done. It requires access to the original structures of the separate documents. If you close the PdfReader to such a document, that structure can no longer be retrieved.

Upvotes: 2

Related Questions