DavidR
DavidR

Reputation: 369

Access PDF hyperlinks with pdf.js

Im using pdf.js to parse a PDF generated from a Google Doc using Google Scripts. I need to ultimately produce a list of the hyperlinks on a given page of the pdf.

I need an equivalent to the pdf.js function PDFpage.getTextContent but which includes hyperlink information, not just text information. Any function within pdf.js that outputs hyperlink information would be a start, but I can't seem to find anything.

I don't need to display the PDF, just extract minimal information from it.

My current code, which just logs the text content of the page:

function numbersLinks(blob) {
    PDFJS.getDocument({data: blob}).then(function (PDFdoc) {
      for (var i=1; i<=PDFdoc.numPages; i++) {
        PDFdoc.getPage(i).then(function (PDFpage) {
        var page_number = PDFpage.pageIndex + 1;
          PDFpage.getTextContent().then(function (text) {
            for (var j in text.items) {
              var item = text.items[j]
              console.log(item)
            }
          })
        })
      }
    })
  }

Upvotes: 0

Views: 2250

Answers (1)

Elsa
Elsa

Reputation: 724

Is this useful for you?

You can get URLs using key of url from annotationData got by getAnnotations().

function numbersLinks(blob) {
  PDFJS.getDocument({data: blob}).then(function (PDFdoc) {
    for (var i=1; i<=PDFdoc.numPages; i++) {
      PDFdoc.getPage(i).then(function (PDFpage) {
        PDFpage.getAnnotations().then(function (annotationData) {
          for (var j=0; j<annotationData.length; j++) {
            console.log(annotationData[j].url);
          }
        })
      })
    }
  })
}

Upvotes: 2

Related Questions