arm93
arm93

Reputation: 129

is it possible to get HTML of a in browser pdf viewer?

When you inspect a pdf viewer page in your browser there is a html structure however, both urllib2 and requests return nothing and BS4 goes into an infite loop.

I just want the title (in the head) of the page.

example page: http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf

Upvotes: 0

Views: 163

Answers (1)

Adil B
Adil B

Reputation: 16806

If you're using Mozilla's pdf.js, you should be able to do this via the PDF.js API, as detailed in this Issue.

pdf.info.get('Title')

or

new Metadata(pdf.catalog.metadata)
metadata.get('dc:title')

Upvotes: 1

Related Questions