Reputation: 4735
So I have a gazillion pdfs in a folder, I want to recursively (using os.path.walk) shrink them. I see that adobe pro has a save as reduced size. Would I be able to use this / how do you suggest I do it otherwise.
Note: Yes, I would like them to stay as pdfs because I find that to be the most commonly used and installed fileviewer.
Upvotes: 25
Views: 72335
Reputation: 11849
The OP question was about "Acrobat Pro has a save as reduced size" and Acrobat Reader is in parts a significantly cut down pro for editing PDF as needed.
We can take advantage of that in a very simple manner but It is not my suggested solution because:
Let us do a comparison so note Adobe Acrobat Reader is good but not the best. I will start with as yet unmentioned, a best in class command line PDF rebuilder "qpdf".
My start point is 463,937 bytes as 15 Page mixed contents source.PDF (Intentionally has 1 non critical wrong byte in its starting "startxref" point PDF per standard starts with the "trailer")
Comparison of PDF RE-compression at 100% quality. Any other compaction can only be done be degrading the Quality or hand balling optimisation.
463,937 bytes Source see note above about Linearization
qpdf corrects any faults it perceives in PDF structures
471,146 bytes qpdf in.pdf --linearize out.pdf
470,039 bytes qpdf in.pdf out.pdf (normal rebuilt/repair PDF)
468,401 bytes qpdf in.pdf --optimize-images out.pdf
468,401 bytes qpdf --stream-data=compress --recompress-flate --optimize-images pdfsizeopt.pdf outq.pdf (Compress PDF)
You may thus wonder, with all those options, why the file is not reduced to smaller and that is because most PDF files are already optimised to an ISO standard structure.
Surely there is some way to maintain quality and optimise more. Let's try some other PDF repair tools. Still not much reduced.
464,138 bytes Typical minimally "Natively" recompressed and "repaired/cleaned" without loss !
What about those mentioned by others? ( see WebView note above )
350,824 bytes GhostScript -dFastWebView -sDEVICE=pdfwrite -o"%cd%\output.pdf" -f input.pdf
335,558 bytes Fixed AND Re-compressed as WebEnhanced by Adobe Reader DC
What about without web enhanced?
313,458 bytes cpdfSqueezed = 67.56% of original.
312,618 bytes GhostScript -sDEVICE=pdfwrite -o"%cd%\output.pdf" -f input.pdf
310,451 bytes internal PNGs optimised by PDFSizeOpt as suggested by others
So far PDFSizeOpt is the best contender as it deliberately extracts bitmaps (perhaps PNG or WebP sourced) and optimises those images using same compression as JPEG.
A false idea is that JPEG based images might be compressed more and that is not the case (unless they benefit from bit reduction, rarely the case). They are already the best PDF DCT compression internal method. thus no need to extract and modify any JPEGs.
Upvotes: 1
Reputation: 309
Consider using Ghostscript, an open-source tool for processing PostScript and PDF files.
# To install Ghostscript, use: sudo apt install ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf input.pdf
This significantly reduces the image quality of the PDF while preserving all other information, compressing a 25MB PDF paper to just 1.7MB.
Wrapped as a Python function:
def compress_pdf_file(input_path, output_path):
import subprocess
subprocess.call(
[
"gs",
"-sDEVICE=pdfwrite",
"-dCompatibilityLevel=1.4",
"-dPDFSETTINGS=/screen",
"-dNOPAUSE",
"-dQUIET",
"-dBATCH",
"-sOutputFile=" + output_path,
input_path,
]
) # To install Ghostscript, use: apt install ghostscript
return output_path
Upvotes: 0
Reputation: 12940
pdfsizeopt
was shrinking the last page of my PDF.
However, the solution provided from a now deleted answer was useful: the tool pdfc
written in Python, hosted on Github and updated from time to time happened to be working fine for me.
You can download the python file pdf_compressor.py
from the repo: https://github.com/theeko74/pdfc/blob/master/pdf_compressor.py
Provided you have Ghostscript installed, you can then run the following:
python pdf_compressor.py <PDF-input-file> --backup
More details on the options available in the README of the repo: https://github.com/theeko74/pdfc
Upvotes: 1
Reputation: 21911
From the project's GitHub page for pdfsizeopt, which is written in Python:
pdfsizeopt is a program for converting large PDF files to small ones. More specifically, pdfsizeopt is a free, cross-platform command-line application (for Linux, Mac OS X, Windows and Unix) and a collection of best practices to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is written in Python..."
You can probably easily adapt this to your specific needs.
Upvotes: 12
Reputation: 252
Realize this is an old question. Thought I would suggest an alternative to pdfsizeopt, as I have experienced quality loss using it for PDFs of maps. PDFTron offers a comprehensive set of functionality. Here is a snippet modified from their web-page (see "example 1"):
import site
site.addsitedir(r"...pathToPDFTron\PDFNetWrappersWin32\PDFNetC\Lib")
from PDFNetPython import PDFDoc, Optimizer, SDFDoc
doc = PDFDoc(inPDF_Path)
doc.InitSecurityHandler()
Optimizer.Optimize(doc)
doc.Save(outPDF_Path, SDFDoc.e_linearized)
doc.Close()
Upvotes: 9