Reputation: 559
I'm printing a PDF document with puppeteer and would like to make a table of contents for all the images and tables in the document, but I need to find out eventual page numbers for these images and tables. Is there any way to do it?
Calculating these things with fixed page height sounds quite complicated, because elements may be moved between pages due to no-break CSS rules.
Upvotes: 1
Views: 1000
Reputation: 559
Found one solution.
In generated PDF document all the elements that need to be in TOC get unique ID and are prepended with empty anchors referencing them.
<a href="#section_123"></a>
<div id="section_123">Section</div>
This way generated PDF will keep these IDs.
Then we take pdfjs-dist
. All these empty links are written to PDF as destinations.
// npm install pdfjs-dist
async function getLocalLinkPages(src) {
const doc = await pdfjs.getDocument(src).promise;
// destinations represent all the empty links
const destinations = await doc.getDestinations();
return Promise.all(
Object.entries(destinations).map(async ([destination, [ref]]) => {
// ref uniquely identifies the page. It looks like { num: 10, gen: 0 } for example,
// but we don't have to bother and can just use doc.getPageIndex
const page = (await doc.getPageIndex(ref)) + 1;
return {destination, page};
})
);
}
Result looks like this
[
{
"destination": "section_123",
"page": 4
},
{
"destination": "component_345",
"page": 5
}
]
Upvotes: 3