nidu
nidu

Reputation: 559

When printing PDF with puppeteer how can I get what page my element is printed on?

I'm printing a PDF document with puppeteer and would like to make a table of contents for all the images and tables in the document, but I need to find out eventual page numbers for these images and tables. Is there any way to do it?

Calculating these things with fixed page height sounds quite complicated, because elements may be moved between pages due to no-break CSS rules.

Upvotes: 1

Views: 1000

Answers (1)

nidu
nidu

Reputation: 559

Found one solution.

In generated PDF document all the elements that need to be in TOC get unique ID and are prepended with empty anchors referencing them.

<a href="#section_123"></a>
<div id="section_123">Section</div>

This way generated PDF will keep these IDs.

Then we take pdfjs-dist. All these empty links are written to PDF as destinations.

// npm install pdfjs-dist

async function getLocalLinkPages(src) {
    const doc = await pdfjs.getDocument(src).promise;
    // destinations represent all the empty links
    const destinations = await doc.getDestinations();

    return Promise.all(
        Object.entries(destinations).map(async ([destination, [ref]]) => {
            // ref uniquely identifies the page. It looks like { num: 10, gen: 0 } for example,
            // but we don't have to bother and can just use doc.getPageIndex
            const page = (await doc.getPageIndex(ref)) + 1;
            return {destination, page};
        })
    );
}

Result looks like this

[
    {
        "destination": "section_123",
        "page": 4
    },
    {
        "destination": "component_345",
        "page": 5
    }
]

Upvotes: 3

Related Questions