6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"is it possible to get HTML of a in browser pdf viewer?\",\"text\":\"

When you inspect a pdf viewer page in your browser there is a html structure however, both urllib2 and requests return nothing and BS4 goes into an infite loop.

\\n\\n

I just want the title (in the head) of the page.

\\n\\n

example page:\\nhttp://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"arm93\"},\"upvoteCount\":0,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","html",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/html/1","children":"html"}]}],["$","span","pdf",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/pdf/1","children":"pdf"}]}],["$","span","web-scraping",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/web-scraping/1","children":"web-scraping"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/ea898295e4a5841640d5bb6166067745?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"arm93","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/5727612/arm93","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"arm93"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",129]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"is it possible to get HTML of a in browser pdf viewer?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

When you inspect a pdf viewer page in your browser there is a html structure however, both urllib2 and requests return nothing and BS4 goes into an infite loop.

\n\n

I just want the title (in the head) of the page.

\n\n

example page:\nhttp://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",163]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","47850458",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/jYLJl.jpg?s=256","alt":"Adil B","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/866021/adil-b","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Adil B"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",16806]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

If you're using Mozilla's pdf.js, you should be able to do this via the PDF.js API, as detailed in this Issue.

\n\n

pdf.info.get('Title')\n

\n\n

new Metadata(pdf.catalog.metadata)\nmetadata.get('dc:title')\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","71875653",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/71875653","className":"text-blue-600 hover:underline","children":"Extract / scrap data from PDF with python"}]}],["$","li","54793831",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/54793831","className":"text-blue-600 hover:underline","children":"How can I download a PDF file from an URL where the PDF is embedded into the HTML?"}]}],["$","li","7383644",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/7383644","className":"text-blue-600 hover:underline","children":"HTML PDF Viewer"}]}],["$","li","1808078",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/1808078","className":"text-blue-600 hover:underline","children":"Is is possible to Download pdf file through pure html?"}]}],["$","li","8540875",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/8540875","className":"text-blue-600 hover:underline","children":"How to scrape information from PDFs?"}]}],["$","li","22707947",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/22707947","className":"text-blue-600 hover:underline","children":"Is it possible to read the pdf in html?"}]}],["$","li","5189766",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/5189766","className":"text-blue-600 hover:underline","children":"How to read a pdf file by using HTML5?"}]}],["$","li","2881182",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/2881182","className":"text-blue-600 hover:underline","children":"Read PDF through Java and get the HTML Content"}]}],["$","li","6252541",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/6252541","className":"text-blue-600 hover:underline","children":"PDF to HTML or similar"}]}],["$","li","5279274",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/5279274","className":"text-blue-600 hover:underline","children":"Screen-scraping for PDF links to download"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

is it possible to get HTML of a in browser pdf viewer?

Answers (1)

Related Questions