Alvaro Fuentes Zurita
Alvaro Fuentes Zurita

Reputation: 434

pdf2json Page Unit: What is it?

I'm trying to use modesty/pdf2json and the output is very useful, but i'm try to figuring the measure units that the library uses. They call it "Page Units", and according to the pdf specs, this is'nt equal to the 1/72 (point), because an entire page has 51 Page Units on height

Anybody knows what is this Page Unit? Where i can find info about this measurement?

Many thanks in advance.

Upvotes: 7

Views: 2958

Answers (2)

Matthew Erwin
Matthew Erwin

Reputation: 1223

TL;DR The important thing to understand is that x,y and element width/height are relative units that are related to page width/height by a ratio that can be translated to any destination ratio by dividing by the existing units and multiplying by the desired units.

Here are the boring details:

PDF's don't have a standard "size" -- you can print anything you like to PDF which may include landscape or portrait orientation, different page sizes (Standard, A0-A5, Legal, Tabloid, Custom), etc. The size of a PDF is in inches so the translation to pixels (including with pdf2json) is not a fixed "24px" as indicated in @async5's answer.

The key to programmatically getting the results you want is to utilize the parsed PDF information (page width and page height) along with how you need to render it (pixel count varies by density of display resolution but an "inch" is always an "inch") and how that translates to the destination resolution you're targeting.

Since the same physical device often supports multiple resolutions (changing the logical DPI) - there may be a difference between the native pixel density and the synthesized density set by the user and so the basis for translating from PDF Units to a local display is going to be a scale factor that's made up of the difference between the PDF file and the target dpi of the physically rendered version of it. This same idea applies with PDF parsing libraries which may use a different DPI than the native "72dpi" of the pdf file itself.

While 96dpi is the Microsoft standard size (72dpi is Apple's standard), the choice of either doesn't give you a correct pixel offset b/c pdf2json or pdf.js don't know anything about the end-user display. For pdf2json coordinates (x/y) they are simply relative measurements between a position on a plane (which is defined by a width/height). So standardized to a 8.5"x11" position with 72dpi would be done as follows:

pdfRect.x = pdfRect.x * ((8.5 * 72) / parsedPdf.formImage.Width); pdfRect.y = pdfRect.y * ((11 * 72) / parsedPdf.formImage.Pages[0].Height);

This kind of formula would work no matter what pdf2json's internal DPI is -- or frankly whatever other PDF parsing library you choose to use. That's because it cancels out those units by division and multiplying using whatever units you need. Even if today pdf2json internally uses 96dpi and downscales by 1/4 and later changes to 72dpi and downscaling by 1/2 the math above for converting to the pixel offset and dpi would work independent of that code change.

Hope this is helpful. When I was dealing with the problem it seemed the Internet was missing a spelled out version of this. Many people solving specific concrete source/destination resolution issues (including specific to a library) or talking about it in the abstract but not explaining the relationship very clearly.

Upvotes: 8

async5
async5

Reputation: 2691

Whatever pdf2json produces is not related to the PDF.js (PDF.js uses standard PDF space unit as a base)

So based on https://github.com/modesty/pdf2json/blob/3fe724db05659ad12c2c0f1b019530c906ad23de/lib/pdfunit.js :

  • pdf2json gets data from PDF.js in 96dpi units
  • scales every unit by 1/4

So page unit equal (96px/inch * 1inch / 4) = 24px.

In your example height is equal 51 * 24px = 1,224px, or 51 * 0.25inch = 12.72inch

Upvotes: 6

Related Questions