sorin
sorin

Reputation: 170390

How to measure different coordinates from a PDF file on Windows?

I am looking for a way to measure the coordinates of different rectangles on a PDF file?

Mainly I do have to perform some overprinting on an existing PDF and I need to know the x,y,w,h on where I am supposed to write the texts.

It seems that Preview.app on Mac has this ability but so far I wasn't able to find anything on Windows that does the same.

Please do not confuse this feature with the Measuring Tools from Adobe Reader which are used to measure distance in printed construction stuff, not the PDF page itself.

It seems that the default using of measure is point, so I need something that would allow to select a rectangle and that will tell me the coordinates.

Please do not suggest on exporting as a imagine and using something else to measure the pixels on the image.

Update: http://legacy.activepdf.com/support/knowledgebase/view.cfm?tk=rl&kb=11866 -- PDF Units, that's what I am looking for, something to measure the PDF coordinates in PDF units.

Upvotes: 0

Views: 957

Answers (1)

plinth
plinth

Reputation: 49179

Disclaimer: I work for Atalasoft.

I know you said not to suggest this, but honestly, it's the easiest approach:

If you mean "sweep out a rectangle in the UI and report the coordinates", that's pretty straight forward, but it's going to be a build-your-own type of thing. What you will need are:

  1. A PDF rasterizer (GhostScript, Acrobat, FoxIt, Atalasoft) to get you an image at a specific resolution.
  2. A tool to display that image in a window and let you sweep out a rectangle (this is straight forward winforms type code for .NET, but we have a control that does this out of the box - combining 1 & 2 into one step).
  3. A tool that can look at the structure of a PDF page and report back the crop box (if any) and the media box for each page (iText, DotPdf).
  4. A tool/understanding of matrix transformations to build the matrix that goes from display space into PDF space (and/or vice versa, probably in iText, definitely in DotPdf)

The code flow becomes something like:

For each page:

  1. Open document, pull out crop and media box, rasterize page, build transformation matrix.
  2. Display image, build/hook into event for selection changing.
  3. Push the image viewer rectangle coordinates through the transformation matrix.
  4. Profit.

From a coding point of view (assuming 0 prior knowledge of this, but a decent understanding of linear algebra), from 3 days to a 2 weeks. If I were to write it, it would probably take on the order of a few hours, but I wrote most of our PDF tools and this is pretty easy.

If your goal is to intuit where rectangles are on the page and report back those coordinates, that's also doable, but it decidedly non-trivial in comparison. You need to write code that can rip through a PDF display list and interpret the contents correctly. That means being able to handle all the cumulative matrix transformations, the graphics state changes, the gstate object use, Form XObject placement, and so on. You need to answer the question "what is a rectangle?" because in PDF placement, it could be an re operator, a set of degenerate beziers, a set of lines, an image of a rectangle or (surprise!) a combination of all of the above. Honestly, intuiting anything about the content on a PDF page is a Herculean task.

Upvotes: 1

Related Questions