user2012677
user2012677

Reputation: 5765

Split long PDF page into mutiple pages

What is the best method for splitting one very long pdf page into seperate pages? In this case, the one page image is made up of what was originally multiple letter size pages that have a black line where each page should be divided. To be clear, it is a single PDF document, with a single page. The single page is an image of hundreds of pages, so it is a very long image.

https://filebin.net/h2wiqckndsugnr1o/sample_pdf_long3.pdf

The pages wihin the image are not consistently the same size because white space was removed on some of the letter sized pages,so some are longer than others.

This explains the issue: https://dustinfreeman.org/blog/pdf-splitting/ However, they don't have a solution to fix that the page breaks are not aligned correctly.

Is there a software, or solution to programically do extract the single image into multiple pages in the single pdf document?

Upvotes: 0

Views: 457

Answers (1)

Bobrovsky
Bobrovsky

Reputation: 14246

I would suggest you to use this approach

  1. Create XObject from the contents of the first page.
  2. Create a number of smaller pages.
  3. Draw the XObject on each page using negative top offset.

Different parts of the XObject will be visible on different pages. Size of the file won't increase much because the image will be reused.

You will need to calculate the top offset and size for each page. You can do this manually, of course. Or you can use some kind of computer vision algorithms to find horizontal black lines. You will have to extract image first. Given the array of coordinates for these lines you will be able to calculate page bounds.

Upvotes: 1

Related Questions