Mitch
Mitch

Reputation: 118

Itextsharp pdf reader of only certain pages in pdf

I'm doing some work using iTextSharp that has to read 100 to 200,000 page pdfs, and somtimes can take upwards of 10 minutes just to create the pdfReader! I've been searching for a way to only read certain pages at a time so it doesn't store the whole pdf all at once, but haven't been able to find anything. Does anyone know if this is possible in iTextSharp?

Upvotes: 1

Views: 1064

Answers (1)

mkl
mkl

Reputation: 96064

The PDF format allows you to restrict yourself to reading only sections of interest to you, you don't have to read all the file to find specific information.

The iText(Sharp) PdfReader optionally supports this if it is initialized in partial mode, cf. the master constructor all other constructors rely on:

/**
 * Constructs a new PdfReader.  This is the master constructor.
 * @param byteSource source of bytes for the reader
 * @param partialRead if true, the reader is opened in partial mode (PDF is parsed on demand), if false, the entire PDF is parsed into memory as the reader opens
 * @param ownerPassword the password or null if no password is required
 * @param certificate the certificate or null if no certificate is required
 * @param certificateKey the key or null if no certificate key is required
 * @param certificateKeyProvider the name of the key provider, or null if no key is required
 * @param closeSourceOnConstructorError if true, the byteSource will be closed if there is an error during construction of this reader
 */
private PdfReader(IRandomAccessSource byteSource, bool partialRead, byte[] ownerPassword, X509Certificate certificate, ICipherParameters certificateKey, bool closeSourceOnConstructorError)

Unfortunately this master constructor is private. Thus, we have to look for constructors allowing us to use true as value of the bool partialRead. The public constructors allowing this are:

public PdfReader(String filename, byte[] ownerPassword, bool partial)

and

[Obsolete("Use the constructor that takes a RandomAccessFileOrArray")]
public PdfReader(RandomAccessFileOrArray raf, byte[] ownerPassword)

(the latter one always using the partial mode).

Thus, if you open a PDF from the file system, use the former constructor with partial = true, and otherwise create an appropriate RandomAccessFileOrArray instance and use the latter one. If no password is required, set ownerPassword = null.

Alternatively some introspection/reflection magic may allow you to directly use the master constructor.

By the way, the latter constructor is the one @ChrisHaas pointed towards in his comment. Unfortunately it meanwhile has been declared deprecated (aka obsoleted).


Ceterum censeo important functionality (like the partial mode) shall be made easy to use.

Upvotes: 1

Related Questions