Ruibin Wu
Ruibin Wu

Reputation: 31

Cropped PDF still contains the original text

I have written a program that crops out a PDF using PyPDF2 and exports it as a new version.

e.g. I crop 123.pdf and it outputs 123 - NEW .pdf

However when using the printing function, it does not print the cropped PDF, but the original one. Even though I have specified to crop the new one.

import os
from PyPDF2 import PdfReader, PdfWriter

order_number = entry1.get()
final = order_number.upper()
edit = PdfReader(f"{pathfile}{order_number} - Shipping Label.pdf")
output = PdfWriter()

page = edit.pages[0]
page.cropbox.upper_left = (63.389830508474574, 643.7972508591065)
page.cropbox.lower_right = (561.8644067796611, 483.2096219931271)
page.rotate(90)
output.add_page(page)

page = edit.pages[1]
page.cropbox.upper_left = (32.45454545454545, 545.4601542416452)
page.cropbox.lower_right = (556.7757575757576, 238.09768637532136)
page.rotate(90)
output.add_page(page)

with open(f"{pathfile}{final}.pdf", "wb") as fp:
    output.write(fp)
os.system(f"lpr -P Munbyn_Printer_1 {pathfile}{final}.pdf")

As you can see, I want to print the {final}.pdf but it prints out the original.

Upvotes: 0

Views: 241

Answers (1)

Martin Thoma
Martin Thoma

Reputation: 136845

PDF has several boxes (mediabox, cropbox, trimbox, artbox, bleedbox). Those define "views" on the underlying document.

You changed the view. That does not affect at all the contained text. It just affects what is shown when you open it with a viewer (or when you attempt to print it).

As many text extraction tools ignore the boxes, the text is still there.

Upvotes: 0

Related Questions