Ivan Kuckir
Ivan Kuckir

Reputation: 2549

PDF: Object Stream with FlateDecode

In my PDF, there is an object

<</Filter/FlateDecode/First 721/Length 3424/N 79/Type/ObjStm>>stream

The raw data on a next line start with bytes

eKoq...  precisely [101, 75, 111, 113, 22, 229, 156, 253, 116, ...

My Flate decoder fails on this input. How should it be processed then?

http://s000.tinyupload.com/?file_id=25511328881895019912

Upvotes: 4

Views: 21893

Answers (3)

dwarring
dwarring

Reputation: 4883

This PDF is encrypted. PDF file trailer is:

endobj
startxref
116
%%EOF

Cross reference stream @byte offset 116 (with some formatting) is:

<</DecodeParms<</Columns 5/Predictor 12>>
   /Encrypt 389 0 R
   % ... etc
   /Type/XRef /W[1 3 1]
 >> stream

Encryption dictionary 389 0 R (formatted) is:

389 0 obj <<
  /CF <<
    /StdCF <<
      /AuthEvent /DocOpen
      /CFM /AESV2
      /Length 16
    >>
  >>
  /EncryptMetadata false
  /Filter /Standard
  /O (...)  % binary owner key
  /P -1084
  /R 4
  /StmF /StdCF
  /StrF /StdCF
  /U (...)  % binary user key
  /V 4
  /Length 128
>>
endobj

The PDF 32000 ISO States:

7.6.1 General A PDF document can be encrypted (PDF 1.1) to protect its contents from unauthorized access. Encryption applies to all strings and streams in the document's PDF file, with the following exceptions:
• The values for the ID entry in the trailer
• Any strings in an Encrypt dictionary
• Any strings that are inside streams such as content streams and compressed object streams, which themselves are encrypted

The referenced object is a content stream in an encrypted PDF. In order to process this stream, you need to implement encryption (AESV2 in this case) and decrypt streams before applying other filters.

Note: this PDF is encrypted with a blank user password, so it opens in most viewers without the need to enter a user password.

Upvotes: 10

pilotandy
pilotandy

Reputation: 87

You have <>stream(blah blah)endstream

First use zlib to inflate the (blah blah) stream data.

If you use python3, its really simple. Just grab all the data between stream and endstream, and pass it through.

results = zlib.decompress(b'(blah blah)')

If you're using c++ and the zlib library, use a function like this.

int inflate(std::string source, std::string &destination)
{
    size_t srcLen = source.size();

    int err = Z_BUF_ERROR;
    size_t destLen = srcLen;

    while (err == Z_BUF_ERROR)
    {
        destLen = destLen * 3;
        char *dest = (char *)malloc(destLen);
        if (dest == nullptr)
        {
            return Z_MEM_ERROR;
        }
        err = uncompress((Bytef *)dest, &destLen, (Bytef *)source.data(), source.size());
        destination = std::string(dest, destLen);
        free(dest);
    }

    return err;
}

The inflated content will be a sequence of numbers followed by some PDF objects (usually dictionaries) e.g. "123 0 124 25 <><>"

In this example, 123 is the object indirect reference number, and 0 is the byte offset after the N pairs

For more reading see page 53 and 54 of the specification. https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Upvotes: -2

JosephA
JosephA

Reputation: 1215

If it crashes that would indicate you have a bug in your Flate decoder. I can't examine it but even if the stream is invalid your PDF software ideally shouldn't crash.

Upvotes: -2

Related Questions