user1379634
user1379634

Reputation: 41

Remove Black Rectangles from PDF using Python (PikePDF or PyPDF2)

Please help me surprise my wife with a useful PDF of her iMessage chain with her now deceased grandmother.

Apple Messages allows you to print conversations to PDF. You have to manually scroll to the top of the message on the Mac, over and over again. It took me 4 hours!

However, it places black rectangle over most photos. This seems to be a half-hearted attempt at privacy because the photo is still there under the black rectangle.

I can use Edit PDF in Adobe Acrobat and remove the main black rectangle, each of the four side rectangles and the four corner rectangles. But doing this for every image in this chain will take me an extraordinary amount of time as the chain goes back 4 years and they texted a lot.

I'm reasonably savvy with Python and have tried to work through using PikePDF and PyPDF2 to do this, but I can't make any headway owing to the complex structure of PDFs.

Note that the extraordinary long page size of the PDF is because when you PrintPDF in Messages it doesn't handle images across page breaks very well. So I set a custom page size of 200 inches height so there are far fewer page breaks.

Example PDF at link below. It is one page, with most of the content removed (using EditPDF in Acrobat) for privacy, with one photo (of my baby son) which is blocked by the aforementioned black rectangle. The target document is 63 such pages (each 200 inches high) and 1.73 gb in size, so you can understand why doing this manually is a wee bit impractical.

Please help me internet. It would mean so much to my wife!

Edit to include previously left out link to sample file: https://www.dropbox.com/s/t4dgwr5eylb4rfm/MessagesBlackBoxExample.pdf?dl=0

Upvotes: 0

Views: 426

Answers (0)

Related Questions