xormar
xormar

Reputation: 103

How to extract attached files from PDF with itext7

How does one extract attached files from a PDF with itext7?

The sample codes I found for itext5 all don't work any more.

A byte[] per file would be what I need, as in the itext5 example below:

    PdfReader reader = new PdfReader(SRC);
    Map<String, byte[]> files = new HashMap<String,byte[]>();
    PdfObject obj;

    for (int i = 1; i <= reader.getXrefSize(); i++) {
        obj = reader.getPdfObject(i);
        if (obj != null && obj.isStream()) {
            PRStream stream = (PRStream)obj;
            byte[] b;
            try {
                b = PdfReader.getStreamBytes(stream);
            }
            catch(UnsupportedPdfException e) {
                b = PdfReader.getStreamBytesRaw(stream);
            }
            files.put(Integer.toString(i), b);
        }
    }

Thx /markus

Upvotes: 0

Views: 4774

Answers (1)

Bruno Lowagie
Bruno Lowagie

Reputation: 77606

You are searching for attachments using brute force instead of by querying the catalog for embedded files and querying page dictionaries for attachment annotations.

Anyway, if I'd port your code to iText 7, it would look like this:

PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));
PdfObject obj;
for (int i = 1; i <= pdfDoc.getNumberOfPdfObjects(); i++) {
    obj = pdfDoc.getPdfObject(i);
    if (obj != null && obj.isStream()) {
        byte[] b;
        try {
            b = ((PdfStream) obj).getBytes();
        } catch (PdfException exc) {
            b = ((PdfStream) obj).getBytes(false);
        }
        FileOutputStream fos = new FileOutputStream(String.format(DEST, i));
        fos.write(b);
        fos.close();
    }
}
pdfDoc.close();

The only change I made, is that I write the stream to a file.

Upvotes: 2

Related Questions