Reputation: 129
When you inspect a pdf viewer page in your browser there is a html structure however, both urllib2 and requests return nothing and BS4 goes into an infite loop.
I just want the title (in the head) of the page.
example page: http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf
Upvotes: 0
Views: 163
Reputation: 16806
If you're using Mozilla's pdf.js, you should be able to do this via the PDF.js API, as detailed in this Issue.
pdf.info.get('Title')
or
new Metadata(pdf.catalog.metadata)
metadata.get('dc:title')
Upvotes: 1