theta
theta

Reputation: 25631

How to correctly crop PDF with uneven text margins

I have PDF like this:

enter image description here

where all margins relative to text content are different on per page basis.

Is there any tool that can correct this for me?

I know Scan Tailor can do this on bitmap, but this is PDF with just text layer, so I'm not after solution that would involve bitmaps at any stage


Update:

OK, for me there is no need to try to run PDFCrop on Windows, as main feature is provided by ghostscript. This command (taken from pdfcrop perl script):

gswin32c.exe -dSAFER -dNOPAUSE -dBATCH -q -r72 -sDEVICE=bbox -f input.pdf 2> bbox.txt

produces bbox.txt file, with text content dimensions, as if there are no margins (bounding box). It looks like this:

%%BoundingBox: 91 259 474 757
%%HiResBoundingBox: 91.000000 259.000000 474.000000 757.000000
%%BoundingBox: 85 224 470 768
%%HiResBoundingBox: 85.000000 224.000000 469.375000 768.000000
%%BoundingBox: 102 217 489 768
%%HiResBoundingBox: 102.000000 217.000000 488.457031 768.000000
...

where first to numbers are lower left corner x,y values and rest two and upper right, measuring from lower left edge (in pixels/points).

This can be read by user's language of choice and then bboxes corrected as desired and passed again to ghostscript as i.e. referenced here: Cropping a PDF using Ghostscript 9.01

Upvotes: 1

Views: 998

Answers (1)

mkl
mkl

Reputation: 95963

If you are sure that only text is involved (and not images with text drawn on it or paths drawing symbols), you can quite easily build such a tool in Java using iText (or most likely also some .NET language using iTextSharp) using the parser package functionality.

The book iText in Action, 2nd edition, in chapter 15.3.4 shows how to find the text margins, and the sample code can be found in ShowTextMargins.java in the SourceForge iText SVN repository.

By manipulating the MediaBox entries of the individual pages you can then adapt the margins as desired.

Upvotes: 1

Related Questions