Reputation: 369
Im using pdf.js to parse a PDF generated from a Google Doc using Google Scripts. I need to ultimately produce a list of the hyperlinks on a given page of the pdf.
I need an equivalent to the pdf.js function PDFpage.getTextContent
but which includes hyperlink information, not just text information. Any function within pdf.js that outputs hyperlink information would be a start, but I can't seem to find anything.
I don't need to display the PDF, just extract minimal information from it.
My current code, which just logs the text content of the page:
function numbersLinks(blob) {
PDFJS.getDocument({data: blob}).then(function (PDFdoc) {
for (var i=1; i<=PDFdoc.numPages; i++) {
PDFdoc.getPage(i).then(function (PDFpage) {
var page_number = PDFpage.pageIndex + 1;
PDFpage.getTextContent().then(function (text) {
for (var j in text.items) {
var item = text.items[j]
console.log(item)
}
})
})
}
})
}
Upvotes: 0
Views: 2250
Reputation: 724
Is this useful for you?
You can get URLs using key of url
from annotationData
got by getAnnotations()
.
function numbersLinks(blob) {
PDFJS.getDocument({data: blob}).then(function (PDFdoc) {
for (var i=1; i<=PDFdoc.numPages; i++) {
PDFdoc.getPage(i).then(function (PDFpage) {
PDFpage.getAnnotations().then(function (annotationData) {
for (var j=0; j<annotationData.length; j++) {
console.log(annotationData[j].url);
}
})
})
}
})
}
Upvotes: 2