Reputation: 9425
Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?
P.S. This question is inspired by this answer which explains how it can be done with GhostScript.
Upvotes: 23
Views: 22701
Reputation: 90193
qpdf
and pdftk
have already been mentioned. To show the commands:
$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress
mutool however hasn't been mentioned yet:
$ mutool clean -d -a orig.pdf uncompressed-orig.pdf
mutool
is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.
I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).
Upvotes: 27
Reputation: 2753
Use cpdf:
cpdf -decompress in.pdf -o out.pdf
and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.
Disclosure: I am the author of cpdf.
Upvotes: 18
Reputation: 4591
This is easy with qpdf and pdftk.
With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.
Upvotes: 7