Reputation: 1653
The PDF Spec defines standard structure types, used to define a structure tree for the document. As far as I can see, there is no element related to pages. Here are the standard structure types for grouping elements:
Document
Part
Art
Sect
Div
...and so on...
Why is there no Page item in this list?
If you want your structure to use pages, what should be used? Part? Sect? Div?
Upvotes: 0
Views: 290
Reputation: 3815
PDF tags exist so that the content type / meaning of elements can be identified. They should be considering a kind of "meta" information for the PDF, simply providing context for the content in a file (so that content can be easily extracted, converted, processed, accessible, etc.). Think of it as a table of contents to a book. Just because the book has x pages doesn't mean that the content structure would be altered if the book's page height was cut in half and now had 2x pages in it.
A Page Object in the PDF Document Structure already groups elements (by nature of each element being on a given page), so doing so in this structure would be a little redundant.
Also, consider this case:
etc...
In this example, Section 1 and Section 2 couldn't both be direct parents of page 3 (not to mention that Section 1 spans two different pages). Additionally, trying to solve this problem really isn't necessary because the elements which is being grouped here is already each a child of its respective Document Structure's Page node in the actual file format.
Upvotes: 1
Reputation: 3184
The PDF has a tree structure (which is what allows it to load any page so fast). The content does not have any structure unless you choose to use the marked content feature of the format which then allows metadata to be include in the data.
Upvotes: 0
Reputation: 333
Appendix G of the PDF Specification gives examples that demonstrate use of the Page object.
Upvotes: 0